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Preface 


This volume contains the papers presented at the 11th International Joint Conference on 
Automated Reasoning (IJCAR 2022) held during August 8—10, 2022, in Haifa, Israel. 
IJCAR was part of the Federated Logic Conference (FLoC 2022), which took place from 
July 31 to August 12, 2022, in Haifa. 

IJCAR is the premier international joint conference on all aspects of automated 
reasoning, including foundations, implementations, and applications, comprising several 
leading conferences and workshops. IJCAR 2022 united the Conference on Automated 
Deduction (CADE), the International Symposium on Frontiers of Combining Systems 
(FroCoS), and the International Conference on Automated Reasoning with Analytic 
Tableaux and Related Methods (TABLEAUX). Previous IJCAR conferences were held 
in Siena, Italy, in 2001, Cork, Ireland, in 2004, Seattle, USA, in 2006, Sydney, Australia, 
in 2008, Edinburgh, UK, in 2010, Manchester, UK, in 2012, Vienna, Austria, in 2014, 
Coimbra, Portugal, in 2016, Oxford, UK, in 2018, and Paris, France, in 2020 (virtual). 

There were 85 submissions. Each submission was assigned to at least three Program 
Committee members and was reviewed in single-blind mode. The committee decided to 
accept 41 papers: 32 regular papers and nine system descriptions. 

The program also included two invited talks, by Elvira Albert and Gilles Dowek, as 
well as a plenary FLoC talk by Aarti Gupta. 

We acknowledge the FLoC sponsors: 


Diamond sponsors: Amazon Web Services, Meta, Intel 

Gold sponsors: Google, Nvidia, Synopsys 

Silver sponsor: Cadence 

Bronze sponsors: DLVSystem, Veridise 

Other sponsors: Technion, The Henry and Marilyn Taub Faculty of Computer Science 


We also acknowledge the generous sponsorship of Springer and the Trakhtenbrot 
family, as well as the invaluable support provided by the EasyChair developers. We 
finally thank the FLoC 2022 organization team for assisting us with local organization 
and general conference management. 
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Using Automated Reasoning Techniques 
for Enhancing the Efficiency and Security 
of (Ethereum) Smart Contracts 


Elvira Albert? ®©, Pablo Gordillo'®, Alejandro Herndndez-Cerezo!®, 
Clara Rodríguez-Núñez!®, and Albert Rubio!?@ 


1 Complutense University of Madrid, Madrid, Spain 
2 Instituto de Tecnologia del Conocimiento, Madrid, Spain 
elvira@fdi.ucm.es 


The use of the Ethereum blockchain platform [17] has experienced an enor- 
mous growth since its very first transaction back in 2015 and, along with it, 
the verification and optimization of the programs executed in the blockchain 
(known as Ethereum smart contracts) have raised considerable interest within 
the research community. As for any other kind of programs, the main properties 
of smart contracts are their efficiency and security. However, in the context of 
the blockchain, these properties acquire even more relevance. As regards effi- 
ciency, due to the huge volume of transactions, the cost and response time of 
the Ethereum blockchain platform have increased notably: the processing capac- 
ity of the transactions is limited and it is providing low transaction ratios per 
minute together with increased costs per transaction. Ethereum is aware of such 
limitations and it is currently working on solutions to improve scalability with 
the goal of increasing its capacity. As regards security, due to the public nature 
and immutability of smart contracts and the fact that their public functions can 
be executed by any user at any time, programming errors can be exploited by 
attackers and have a high economic impact [7,13]. Verification is key to ensure 
the security of smart contract’s execution and provide safety guarantees. This 
talk will present our work on the use of automated reasoning techniques and 
tools to enhance the security and efficiency [2—4,6] of Ethereum smart contracts 
along the two directions described below. 


Security. Our main focus on security will be to detect and avoid potential 
reentrancy attacks, one of the best known and exploited vulnerabilities that 
have caused infamous attacks in the Ethereum ecosystem due to they economic 
impact [9,11,15]. Reentrancy attacks might occur on programs with callbacks, 
a mechanism that allows making calls among contracts. Callbacks occur when a 
method of a contract invokes a method of another contract and the latter, either 
directly or indirectly, invokes one or more methods of the former before the orig- 
inal method invocation returns. While this mechanism is useful and powerful 


This work was funded partially by the Ethereum Foundation (Grant FY21-0372), the 
Spanish MCIU, AEI and FEDER (EU) project RTI2018-094403-B-C31 and by the CM 
project $2018/TCS-4314 co-funded by EIE Funds of the European Union. 
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in event-driven programming, it has been used to exploit vulnerabilities. Our 
approach to detect potential reentrancy problems is to ensure that the program 
meets the Effectively Callback Freeness (ECF) property [10]. ECF guarantees the 
modularity of a contract in the sense that executions with callbacks cannot result 
in new states that are not reachable by callback free executions. This implies 
that the use of callbacks will not lead to unpredicted, potentially dangerous, 
states. In order to ensure the ECF property, we use commutation and projection 
of fragments of code [6]. Intuitively, given a function fragment A followed by B 
(denoted A.B), in case we can receive a callback to some function f between 
these fragments (that is, A.f.B), we ensure safety by proving that this execu- 
tion that contains callbacks is equivalent to a callback free execution: either to 
A.B (projection), f.A.B (left-commutation) or A.B.f (right-commutation). The 
use of automated reasoning techniques enables proving this kind of properties. 
Inspired by the use of SMT solvers to prove redundancy of concurrent executions 
[1,8,16], we have implemented such checks using state-of-the-art SMT solvers. 

The ECF property can be generalized to allow callbacks to introduce new 
behaviors as long as they are benign, as [5] does by defining the notion of R-ECF. 
The main difference between ECF and R-ECF is that while ECF checks that 
the states reached by executions with callbacks are exactly the same as the ones 
reached by executions that do not contain callbacks, R-ECF checks that they 
satisfy a relation with respect to the states reached without callbacks. This way, 
R-ECF is able to recognize and distinguish the benign behaviors introduced by 
callbacks from the ones that are potentially dangerous, while ECF cannot. The 
main application of R-ECF is that, from a particular invariant of the program, it 
allows reducing the problem of verifying the invariant in the presence of callbacks, 
to the callback-free setting. For example, if we consider the invariant balance > 
0 and prove that the contract is R-ECF with respect to the relation balance,, > 
balance, free (i.e., the balance reached by executions with callbacks is greater 
than the one reached without callbacks), then we only need to consider callback 
free executions in order to prove the preservation of the invariant. 

We considered as benchmarks the top-150 contracts based on volume of 
usage, and studied the modularity of their functions in terms of ECF and R- 
ECF. A total of 386 of their functions were susceptible to have callbacks, from 
which 62.7% were verified to be ECF. The R-ECF approach was able to increase 
the accuracy of the analysis, being able to prove the correctness of an extra 2% 
of functions [5,6]. 


Efficiency. The main focus on efficiency will be on optimizing the resource 
consumption of smart contract executions. On the Ethereum blockchain, the 
resource consumption is measured in terms of gas, a unit introduced in the sys- 
tem to quantify the computational effort and charge a fee accordingly in order 
to have a transaction executed. To understand how we can optimize gas, we 
need to discuss it (and do it) at the level of the Ethereum bytecode. Smart con- 
tracts in Ethereum are executed using the Ethereum Virtual Machine (EVM). 
The EVM is a simple stack-based architecture which uses 256-bit words and 
has its own repertory of instructions (EVM opcodes). In the EVM, the mem- 
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ory model is split into two different structures: the storage, which is persistent 
between transactions and expensive to use; and the memory, which does not 
persist between transactions and is cheaper. Each opcode has a gas cost associ- 
ated to its execution. Besides, an additional fee must be paid for each byte when 
the smart contract is deployed. Thus, the resource to be optimized can be either 
the total amount of gas in a program or its size. Even though both criteria are 
usually related, there are some situations in which they do not correlate. For 
instance, pushing a big number in the stack consumes a small amount of gas 
and increases significantly the bytecode size, whereas obtaining the same value 
using arithmetic operations is more expensive but involves fewer bytes. 

Among all possible techniques to optimize code, we have used the technique 
known as superoptimization [12]. The main idea of superoptimization is auto- 
matically finding an equivalent optimal sequence of instructions to another given 
loop-free sequence. In order to achieve this goal, we enumerate all possible can- 
didates and determine the best option among them wrt.the optimization cri- 
teria. In the context of EVM, there exists several superoptimizers: EBSO [14], 
SYRUP [3,4] and GASOL [2]. The techniques presented in this work correspond 
to the ones implemented in GASOL, which are an improvement and extension 
of the ones in SYRUP. We apply two kinds of automated reasoning techniques 
to superoptimize Ethereum smart contracts, symbolic execution and Max-SMT 
as described next. 


— Symbolic execution is used to obtain a a representation on how the stack and 
memory evolves wrt. to an initial stack. We determine the lowest size of the 
stack needed to perform all the operations in a block and apply symbolic exe- 
cution to an initial stack containing that number of unknown stack variables. 
Opcodes representing operations that don’t manage the stack are left as unin- 
terpreted functions. Then, we apply as many simplification rules as possible 
from a fixed set of rules. Depending on the chosen criteria, some rules are 
disabled if they lead to worse candidates. Moreover, we apply static analysis 
regarding memory opcodes to determine whether there are some redundant 
store or load operations inside a block that can be safely removed or replaced. 
This leads to a simplified specification of the optimal block. 

— The second technique involves synthesizing the optimal block from a given 
symbolic representation using a Max-SMT solver. The synthesis problem is 
expressed as a first-order formula in which every model corresponds to a 
valid equivalent block. Our encoding is expressed in the simple logic QF_IDL, 
so that the Max-SMT solver can reason effectively on EVM blocks. In this 
encoding, the length of the sequence of instructions is fixed by an upper 
bound so that quantifiers are avoided. NOP operations are considered in the 
encoding to allow shorter sequences. The state of the stack is represented 
explicitly for each position in the sequence. Every instruction in the block and 
every basic stack operation have a constraint that reflects the impact they 
have on the stack for each possible position. Memory accesses are encoded 
as a partial order relation that synthesizes the dependencies among them. 
Regarding the optimization process, we express the cost (gas or bytes-size) of 
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each instruction using soft constraints. For both criteria, the corresponding 
set of soft constraints satisfies that an optimal model returned by the solver 
corresponds to an optimal block for that criteria. 


Combining both approaches, we obtain significant savings for both criteria. 
For a subset of 30 smart contracts, selected among the latest published in Ether- 
scan as of June 21, 2021 and optimized using the compiler solc v0.8.9, GASOL 
still manages to reduce 0.72% the amount of gas with the gas criteria enabled, 
and decreases the overall size by 3.28% with the size criteria enabled. 


Future work. The current directions for future work include enhancing the per- 
formance of the smart contract optimizer in both accuracy and scalability of the 
process while keeping the efficiency. For the accuracy we are currently working 
on adding further reasoning on non-stack operations while staying in a quite 
simple logic. This will allow us to consider a wider set of equivalent blocks and 
hence increase the savings. Scalability can be threatened when we consider blocks 
of code of large size. We are investigating different approaches to scale better, 
including heuristics to partition the blocks in smaller sub-blocks, more efficient 
SMT encodings, among others. Finally, another direction for future work is to 
formally prove the correctness of the optimizer, 7.e.developing a checker that 
can formally prove the equivalence of the optimized and the original (Ethereum) 
bytecode. For this, we are planning to use the Coq proof assistant in which 
we will develop a checker that, given an original bytecode -that corresponds 
a block of the control flow graph- and its optimization, it can formally prove 
their equivalence for any possible execution, and optionally it can generate a 
soundness proof that can be used as certificate. 
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1 Yet Another Crisis of the Universality of Mathematical 
Truth 


The development of computerized proof systems, such as COQ, MATITA, AGDA, 
LEAN, HOL 4, HOL LIGHT, ISABELLE/HOL, MIZAR, etc. is a major step 
forward in the never ending quest of mathematical rigor. But it jeopardizes the 
universality of mathematical truth [5]: we used to have proofs of Fermat’s little 
theorem, we now have Coq proofs of Fermat’s little theorem, ISABELLE/HOL 
proofs of Fermat’s little theorem, PVS proofs of Fermat’s little theorem, etc. 
Each proof system: COQ, ISABELLE/HOL, PVS, etc. defining its own language 
for mathematical statements and its own truth conditions for these statements. 

This crisis can be compared to previous ones, when mathematicians have 
disagreed on the truth of some mathematical statements: the discovery of the 
incommensurability of the diagonal and side of a square, the introduction of 
infinite series, the non-Euclidean geometries, the discovery of the independence 
of the axiom of choice, and the emergence of constructivity. All these past crises 
have been resolved. 


2 Predicate Logic and Other Logical Frameworks 


One way to resolve a crisis, such as that of non-Euclidean geometries, or that of 
the axiom of choice, is to view geometry, or set theory, as an axiomatic theory. 
The judgement that the statement the sum of the angles in a triangle equals 
the straight angle is true evolves to that that it is a consequence of the parallel 
axiom and of the other axioms of geometry. Thus, the truth conditions must 
be defined, not for the statements of geometry, but for arbitrary sequents: pairs 
T A- A formed with a theory, a set of axioms, I’ and a statement A. 

This induces a separation between the definition of the truth conditions of 
a sequent: the logical framework and the definition of the various geometries 
as theories in this logical framework. This logical framework, Predicate logic, 
was made precise by Hilbert and Ackermann [13], in 1928, more than a century 
after the beginning of the crisis of non-Euclidean geometries. The invention of 
© The Author(s) 2022 
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Predicate Logic was a huge step forward. But Predicate Logic also has some 
limitations. 

To overcome these limitation, it has been modernized in various ways in the 
last decades. First, \-PROLOG [15] and ISABELLE [17] have extended Predicate 
logic with variable binding function symbols, such as the symbol A in the term 
Ax x. Then, the A/7-calculus [12] has permitted to explicitly represent proof- 
trees, using the so-called Brouwer-Heyting-Kolmogorov algorithmic interpreta- 
tion of proofs and Curry-de Bruijn-Howard correspondence. In a second stream 
of research, Deduction modulo theory [4,6] has introduced a distinction between 
computation and deduction, in such a way that the statement 27 x 37 = 999 
computes to 999 = 999, with the algorithm of multiplication, and then to T, 
with the algorithm of natural number comparison. It thus has a trivial proof. A 
third stream of research has extended classical Predicate logic to an Ecumeni- 
cal predicate logic [3,9-11, 14, 18,19] with both constructive and classical logical 
constants. 

These streams of research have merged, to provide a logical framework, the 
Al-calculus modulo theory [2], also called Martin-Léf’s logical framework [16]. 
This framework permits function symbols to bind variables, it includes an explicit 
representation for proof-trees, it distinguishes computation from deduction, and 
it permits to define both constructive and classical logical constants. It is the 
basis of the language DEDUKTI, where Simple type theory, Martin-Lof’s type 
theory, the Calculus of constructions, etc. can easily be expressed. 


3 The Theory U 


The expression in DEDUKTI of Simple type theory, Simple type theory with 
polymorphism, Simple type theory with predicate subtyping, the Calculus of 
constructions, etc. use symbol declarations and computation rules that play the 
rôle of axioms in Predicate logic. But, just like the various geometries or the 
various set theories share a lot of axioms and distinguish by a few, these theories 
share a lot of symbols and rules. This remark leads to defining a large theory, 
the theory U [1], that contains Simple type theory, Simple type theory with 
polymorphism, Simple type theory with predicate subtyping, and the Calculus 
of constructions, etc. as sub-theories. 

Many proofs developed in proof processing systems can be expressed in the 
theory U and depending on the symbols and rules they use they can be translated 
to more common formulations of the theories implemented in these systems. 

For instance, F. Thiré has expressed a large library of arithmetic, originally 
developed in MATITA, in an sub-theory of the theory U, corresponding to Sim- 
ple type theory with polymorphism and translated these proofs to the language 
of seven proof systems [20], Y. Géran has expressed the first book of Euclid’s 
elements originally developed in Coq, in a sub-theory of the theory U, cor- 
responding to Predicate logic, and translated these proofs to the language of 
many proof systems, including predicate logic ones [8], and T. Felicissimo has 
shown that a large library of proofs originally developed in MATITA, including 
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a proof of Bertrand’s postulate, could be expressed in predicative type theory 
and expressed in Agda [7]. 
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Abstract. Proof production for SMT solvers is paramount to ensure 
their correctness independently from implementations, which are often 
prohibitively difficult to verify. Historically, however, SMT proof pro- 
duction has struggled with performance and coverage issues, resulting in 
the disabling of many crucial solving techniques and in coarse-grained 
(and thus hard to check) proofs. We present a flexible proof-production 
architecture designed to handle the complexity of versatile, industrial- 
strength SMT solvers and show how we leverage it to produce detailed 
proofs, including for components previously unsupported by any solver. 
The architecture allows proofs to be produced modularly, lazily, and with 
numerous safeguards for correctness. This architecture has been imple- 
mented in the state-of-the-art SMT solver cvc5. We evaluate its proofs 
for SMT-LIB benchmarks and show that the new architecture produces 
better coverage than previous approaches, has acceptable performance 
overhead, and supports detailed proofs for most solving components. 


1 Introduction 


SMT solvers [9] are widely used as backbones of formal methods tools in a 
variety of applications, often safety-critical ones. These tools rely on the solver’s 
correctness to guarantee the validity of their results such as, for instance, that an 
access policy does not inadvertently give access to sensitive data [4]. However, 
SMT solvers, particularly industrial-strength ones, are often extremely complex 
pieces of engineering. This makes it hard to ensure that implementation issues do 
not affect results. As the industrial use of SMT solvers increases, it is paramount 
to be able to convince non-experts of the trustworthiness of their results. 

A solution is to decouple confidence from the implementation by coupling 
results with machine-checkable certificates of their correctness. For SMT solvers, 
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this amounts to providing proofs of unsatisfiability. The main challenges are 
justifying a combination of theory-specific algorithms while keeping the solver 
performant and providing enough details to allow scalable proof checking, i.e., 
checking that is fundamentally simpler than solving. Moreover, while proof pro- 
duction is well understood for propositional reasoning and common theories, 
that is not the case for more expressive theories, such as the theory of strings, 
or for more advanced solver operations such as formula preprocessing. 

We present a new, flexible proof-production architecture for versatile, indus- 
trial-strength SMT solvers and discuss its integration into the cvc5 solver [5]. The 
architecture (Sect.2) aims to facilitate the implementation effort via modular 
proof production and internal proof checking, so that more critical components 
can be enabled when generating proofs. We provide some details on the core proof 
calculus and how proofs are produced (Sect.3), in particular how we support 
eager and lazy proof production with built-in proof reconstruction (Sect. 3.2). 
This feature is particularly important for substitution and rewriting techniques, 
facilitating the instrumentation of notoriously challenging functionalities, such as 
simplification under global assumptions [6, Section 6.1] and string solving [40, 46, 
48], to produce detailed proofs. Finally, we describe (Sect. 5) how the architecture 
is leveraged to produce detailed proofs for most of the theory reasoning, critical 
preprocessing, and underlying SAT solving of cvc5. We evaluate proof production 
in cvc5 (Sect.6) by measuring the proof overhead and the proof quality over an 
extensive set of benchmarks from SMT-LIB [8]. 

In summary, our contributions are a flexible proof-producing architecture 
for state-of-the-art SMT solvers, its implementation in cvc5, the production of 
detailed proofs for simplification under global assumptions and the full theory 
of strings, and initial experimental evidence that proof-production overhead is 
acceptable and detailed proofs can be generated for a majority of the problems. 


Preliminaries. We assume the usual notions and terminology of many-sorted 
first-order logic with equality (~) [29]. We consider signatures X all containing 
the distinguished Boolean sort Bool. We adopt the usual definitions of well-sorted 
5/-terms, with literals and formulas as terms of sort Bool, and ¥/-interpretations. 
A X-theory is a pair T = (2,1) where I, the models of T, is a class of X- 
interpretations closed under variable reassignment. A X-formula y is T-valid 
(resp., T-unsatisfiable) if it is satisfied by all (resp., no) interpretations in I. 
Two X-terms s and t of the same sort are T-equivalent if s ~ t is T-valid. 
We write @ to denote a tuple (a1, ..., an) of elements, with n > 0. Depending 
on context, we will abuse this notation and also denote the set of the tuple’s 
elements or, in case of formulas, their conjunction. Similarly, for term tuples 5, ¢ 
of the same length and sort, we will write # ~ t to denote the conjunction of 
equalities between their respective elements. 


2 Proof-Production Architecture 


Our proof-production architecture is intertwined with the CDCL(T) architec- 
ture [43], as shown in Fig. 1. Proofs are produced and stored modularly by each 
solving component, which also checks they meet the expected proof structure 
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Fig. 1. Flexible proof-production architecture for CDCL(T )-based SMT solvers. In the 
above, pi € {6, E} for each 2, with Y; not necessarily distinct from Ypi+ı. 


for that component, as described below. Proofs are combined only when needed, 
via post-processing. The pre-processor receives an input formula y and simplifies 
it in a variety of ways into formulas 1, ..., dn. For each ¢;, the pre-processor 
stores a proof P : p — ġ; justifying its derivation from y. 

The propositional engine receives the preprocessed formulas, and its clausifier 
converts them into a conjunctive normal form C, A---AC;. A proof P : Y > C; 
is stored for each clause C;, where w is a preprocessed formula. Note that sev- 
eral clauses may derive from each formula. Corresponding propositional clauses 
CP, ..., CP, where first-order atoms are abstracted as Boolean variables, are sent 
to the SAT solver, which checks their joint satisfiability. The propositional engine 
enters a loop with the theory engine, which considers a set of literals asserted 
by the SAT solver (corresponding to a model of the propositional clauses) and 
verifies its satisfiability modulo a combination of theories T. If the set is T- 
unsatisfiable, a lemma L is sent to the propositional engine together with its 
proof P : L. Note that since lemmas are T-valid, their proofs have no assump- 
tions. The propositional engine stores these proofs and clausifies the lemmas, 
keeping the respective clausification proofs in the clausifier. The clausified and 
abstracted lemmas are sent to the SAT solver to block the current model and 
cause the assertion of a different set of literals, if possible. If no new set is 
asserted, then all the clauses C1, ..., Cm generated until then are jointly unsat- 
isfiable, and the SAT solver yields a proof P : Cy A++- ACm —> L. Note that 
the proof is in terms of the first-order clauses, as are the derivation rules that 
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conclude L from them. The propositional abstraction does not need to be rep- 
resented in the proof. 

The post-processor of the propositional engine connects the assumptions of 
the SAT solver proof with the clausifier proofs, building a proof P : ¢,A:--A¢dn > 
L. Since theory lemmas are T-valid, the resulting proof only has preprocessed 
formulas as assumptions. The final proof is built by the SMT solver’s post- 
processor combining this proof with the preprocessing proofs P : p — ¢;. The 
resulting proof P : p — L justifies the T-unsatisfiability of the input formula. 


3 The Internal Proof Calculus 


In this section, we specify how proofs are represented in the internal calculus of 
cvc5. We also provide some low-level details on how proofs are constructed and 
managed in our implementation. 

The proof rules of the internal calculus are similar to rules in other calculi for 
ground first-order formulas, except that they are made a little more operational 
by optionally having argument terms and side conditions. Each rule has the form 


een AL, or pee | ty tm go 
Y Y 
with identifier r, premises Y1,..., Pn, arguments tı,...,tm, conclusion w, and 
side condition C. The argument terms are used to construct the conclusion from 
the premises and can be used in the side condition together with the premises. 


3.1 Proof Checkers and Proofs 


The semantics of each proof rule r is provided operationally in terms of a proof- 
rule checker for r. This is a procedure that takes as input a list of argument 
terms tand a list of premises ¢ for r. It returns fail if the input is malformed, 
i.e., it does not match the rule’s arguments and premises or does not satisfy the 
side condition. Otherwise, it returns a conclusion formula ~ expressing the result 
of applying the rule. All proof rules of the internal calculus have an associated 
proof-rule checker. We say that a proof rule proves a formula w, from given 
arguments and premises, if its checker returns w. 

cvc5 has an internal proof checker built modularly out of the individual 
proof-rule checkers. This checker is meant mostly for internal debugging dur- 
ing development, to help guarantee that the constructed proofs are correct. The 
expectation is that users will rely instead on third-party tools to check the proof 
certificates emitted by the solver. 

A proof object is constructed internally using a data structure that we will 
describe abstractly here and call a proof node. This is a triple (r, N ; t) consisting 
of a rule identifier r; a sequence N of proof nodes, its children; and a sequence t 
of terms, its arguments. The relationships between proof nodes and their children 
induces a directed graph over proof nodes, with edges from proofs nodes to their 
children. We call a single-root graph rooted at node N a proof. A proof P is 
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Fig. 2. Core proof rules of the internal calculus. 


well-formed if it is finite, acyclic, and there is a total mapping W from the 
nodes of P to formulas such that, for each node N = (r, (N1, .-. , Nm), t), Z(N) 
is the formula returned by the proof checker for rule r when given premises 
W(N,),...,W(N,,) and arguments t. For a well-formed proof P with root N and 
mapping Y, the conclusion of P is the formula Y(N); a subproof of P is any 
proof rooted at a descendant of N in P. For convenience, we will identify a 
well-formed proof with its root node from now on. 


3.2 Core Proof Rules 


In total, the internal calculus of cvc5 consists of 155 proof rules,! which cover 
all reasoning performed by the SMT solver, including theory-specific rules, rules 
for Boolean reasoning, and others. In the remainder of this section, we describe 
the core rules of the internal calculus, which are used throughout the system, 
and are illustrated in Fig. 2. 


Proof Rules for Equality. Many theory solvers in cvc5 perform theory-specific 
reasoning on top of basic equational reasoning. The latter is captured by the 
proof rules eq-_res, refl, symm, trans, and cong. The first rule is used to prove a 
formula ~ from a formula y that was proved equivalent to ~. The rest are the 
standard rules for computing the congruence closure of a set of term equalities. 


Proof Rules for Rewriting, Substitution and Witness Forms. A single 
coarse-grained rule, sr, is used for tracking justifications for core utilities in the 
SMT solver such as rewriting and substitution. This rule, together with other 
non-core rules with side conditions (omitted for brevity), allows the generation of 
coarse-grained proofs that trust the correctness of complex side conditions. Those 
conditions involve rewriting and substitution operations performed by cvc5 dur- 
ing solving. More fine-grained proofs can be constructed from coarse-grained 
ones by justifying the various rewriting and substitution steps in terms of sim- 
pler proof rules. This is done with the aid of the equality rules mentioned above 
and the additional core rules atom_rewrite and witness. To describe atom_rewrite, 
witness, and sr, we first need to introduce some definitions and notations. 


' See https://cvc5.github.io/docs/cvc5-1.0.0/proofs/ proof _rules.html. 


20 H. Barbosa et al. 


A rewriter R is a function over terms that preserves equivalence in the back- 
ground theory T, i.e., returns a term t|rz T-equivalent to its input t. We call 
tle the rewritten form of t with respect to R. Currently, cvc5 uses a handful 
of specialized rewriters for various purposes, such as evaluating constant terms, 
preprocessing input formulas, and normalizing terms during solving. Each indi- 
vidual rewrite step executed by a rewriter R is justified in fine-grained proofs 
by an application of the rule atom_rewrite, which takes as argument both (an 
identifier for) R and the term s the rewrite was applied to. Note that the rule’s 
soundness requires that the rewrite step be equivalence preserving. 

A (term) substitution o is a finite sequence (tı > 51,...,tn © Sn) of oriented 
pairs of terms of the same sort. A substitution method S is a function that takes a 
term r and a substitution o and returns a new term that is the result of applying 
g to r, according to some strategy. We write S(r, o) to denote the resulting term. 
We distinguish three kinds of substitution methods for o: simultaneous, which 
returns the term obtained by simultaneously replacing every occurrence of term 
ti in r with s;, for i = 1,...,n; sequential, which splits ø into n substitutions 
(tı > $1),...,(tn + Sn) and applies them in sequence to r using the simultane- 
ous strategy above; and fixed-point, which, starting with r, repeatedly applies ø 
with the simultaneous strategy until no further subterm replacements are pos- 
sible. For example, consider the application S(y, (x => u,y > f(z),z => g(x))). 
The steps the substitution method takes in computing its result are the fol- 
lowing: y ~ f(z) if S is simultaneous; y ~ f(z) ~ f(g(x)) if S is sequential; 
y ~> f(z) ~ flg(a)) ~ Ff (g(u)) if S is fixed-point. 

In cvc5, we use a substitution derivation method D to derive a contextual 
substitution (tı > $1,...,tn 2 Sn) from a collection ¢ of derived formulas. The 
substitution essentially orients a selection of term equalities t; ~ s; entailed by 
g and, as such, can be applied soundly to formulas derived from ¢.? We write 
D() to denote the substitution computed by D from ¢. 

Finally, cvc5 often introduces fresh variables, or Skolem variables, which are 
implicitly globally existentially quantified. This happens as a consequence of 
Skolemization of existential variables, lifting of ifthen-else terms, and some kinds 
of flattening. Each Skolem variable k is associated with a term kT of the same 
sort containing no Skolem variables, called its witness term. This global map 
from Skolem variables to their witness term allows cvc5 to detect when two 
Skolem variables can be equated, as a consequence of their respective witness 
terms becoming equivalent in the current context [47]. Witness terms can also be 
used to eliminate Skolem variables at proof output time. We write tî to denote 
the witness form of term t, which is obtained by replacing every Skolem variable 
in t by its witness term. For example, if kı and k2 are Skolem variables with 
associated witness terms ite(z ~ z,y,z) and y — z, respectively, and ọ is the 
formula ite(a ~ ko,ki © y,ki & z), the witness form yt of y is the formula 
ite(a ~ y— z, ite(£ ~ z,y, z) & y, ite(x ~ z,y, z) & z). When a Skolem variable k 


2 Observe that substitutions are generated dynamically from the formulas being pro- 
cessed, whereas rewrite rules are hard-coded in cvc5’s rewriters. 
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appears in a proof, the witness proof rule is used to explicitly constrain its value 
to be the same as that of the term kf it abstracts.’ 

We can now explain the sr proof rule, which is parameterized by a substitution 
method S, a rewriter R, and substitution derivation method D. The rule is used 
to transform the proof of a formula ọ into one of a formula w provided that the 
two formulas are equal up to rewriting under a substitution derived from the 
premises Ø. Note that this rule is quite general because its conclusion Y, which 
is provided as an argument, can be any formula that satisfies the side condition. 


Proof Rules for Scoped Reasoning. Two of the core proof rules, assume 
and scope, enable local reasoning. Together they achieve the effect of the =>- 
introduction rule of Natural Deduction. However, separating the local assump- 
tion functionality in assume provides more flexibility. That rule has no premises 
and introduces a local assumption y provided as an argument. The scope rule 
is used to close the scope of the local assumptions 41, ..., pn made to prove a 
formula y, inferring the formula p1 A+: A gn > Q. 

We say that y is a free assumption in proof P if P has a node (assume, (), vy) 
that is not a subproof of a scope node with y as one of its arguments. A proof 
is closed if it has no free assumptions, and open otherwise. 


Soundness. All proof rules other than assume are sound with respect to the 
background theory T in the following sense: if a rule proves a formula 7 from 
premises Ø, every model of T that satisfies Z, and assigns the same values to 
Skolem variables and their respective witness term, satisfies ~ as well. Based on 
this and a simple structural induction argument, one can show that well-formed 
closed proofs have T-valid conclusions. In contrast, open proofs have conclusions 
that are T-valid only under assumptions. More precisely, in general, if Ø are all 
the free assumptions of a well-formed proof P with conclusion Y% and K are all 
the Skolem variables introduced in P, then kx kt A Z => w is T-valid. 


3.3 Constructing Proof Nodes 


We have implemented a library of proof generators that encapsulates common 
patterns for constructing proof nodes. We assume a method getProof that takes 
the proof generator g and a formula ọ as input and returns a proof node with 
conclusion y based on the information in g. During solving, cvc5 uses a combina- 
tion of eager and lazy proof generation. In general terms, eager proof generation 
involves constructing proof nodes for inference steps at the time those steps are 
taken during solving. Eager proof generation may be required if the computation 
state pertinent to that inference cannot be easily recovered later. In contrast, 
lazy proof generation occurs for inferred formulas associated with proof genera- 
tors that can do internal bookkeeping to be able to construct proof nodes for the 
formula after solving is completed. Depending on the formula, different kinds of 
proof generators are used. For brevity, we only describe in detail (see Sect. 3.2) 


3 The proof rules that account for the introduction of Skolem variables in the first 
place are not part of the core set and so are not discussed here. 
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Algorithm 1. Proof generation for term-conversion generators, rewrite-once 
policy. B is a lazy proof builder, R a map from terms to their converted form, 
and Cpre, Cpost are sets of pairs of equalities and the proof generators justifying 
them. 

getProof(g, Y) where g contains Cpre, Cpost and y is ti & t2 

1: B:=9, R:=90 
2: getTermConv(t1, Cpre; Cpost, B, R) 
3: if R[ti] 4 t2 then fail else return getProof(B, tı ~ Ri[t,]) 


getTermConv(s, Cpre, Cpost, B, R), where s = f(s1,..-,8n) 


j= 


if s in dom(R) then return 


2: if (s ~ s',g') E€ Cpre for some s’,g’ then 

3: Ris] := s’, addLazyStep(B, s ~ s’,g’) 

4: return 

5: for 1 < i < n do getTermConv(s;, Cpre, Cpost, B, R) 

6: Ris] := r, where r = f(R[si],..., R[sn]) 

7: if s Ar then addStep(B, cong, (sı ~% R[si],...,5n © Rlsn]), f) 

8: else addStep(B, rfl, (), s ~ s) 

9: if (r © r’,g’) E€ Cpost for some r’, g’ then 

10: Ris] := r’, addLazyStep(B,r œ% r’, g’), addStep(B, trans, (s  r,r ~ r’), ()) 


the proof generator most relevant to the core calculus, the term-conversion proof 
generator, targeted for substitution and rewriting proofs. 


4 Proof Reconstruction for Substitution and Rewriting 


Once it determines that the input formulas y1,...,y, are jointly unsatisfiable, 
the SMT solver has a reference to a proof node P that concludes L from the 
free assumptions ¥1,..., Yn. After the post-processor is run, the (closed) proof 
(scope, P’, (y1,-.--,n)) is then generated as the final proof for the user, where 
P’ is the result of optionally expanding coarse-grained steps (in particular, appli- 
cations of the rule sr) in P into fine-grained ones. To do so, we require the 
following algorithm for generating term-conversion proofs. 

In particular, we focus on equalities t ~ s whose proof can be justified by 
a set of steps that replace subterms of t until it is syntactically equal to s. We 
assume these steps are provided to a term-conversion proof generator. Formally, 
a term-conversion proof generator g is a pair of sets Cpre and Cpost- The set Core 
(resp., Cpost) contains pairs of the form (t © s,9,,) indicating that t should 
be replaced by s in a preorder (resp., postorder) traversal of the terms that g 
processes, where gł, is a proof generator that can prove the equality t ~ s. We 
require that neither Cpre NOY Cpost contain multiple entries of the form (t © s1, 91) 
and (t ~ s2, 92) for distinct (s1, g1) and (s2, g2). 

The procedure for generating proofs from a term-conversion proof generator 
g is given in Algorithm 1. When asked to prove an equality tı œ% t2, getProof 
traverses the structure of tı and applies steps from the sets Cpre and Cpost from g. 
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The traversal is performed by the auxiliary procedure getTermConv which relies 
on two data structures. The first is a lazy proof builder B that stores the inter- 
mediate steps in the overall proof of tı % tg. The proof builder is given these 
steps either via addStep, as a concrete triple with the proof rule, a list of premise 
formulas, and a list of argument terms, or as a lazy step via addLazyStep, with a 
formula and a reference to another generator that can prove that formula. The 
second data structure is a mapping R from terms to terms that is updated (using 
array syntax in the pseudo-code) as the converted form of terms is computed 
by getTermConv. For any term s, executing getTermConv(s, Cpre, Cpost, B, R) will 
result in R[s] containing the converted form of s according to the rewrites in Cpre 
and Cpost, and B storing a proof step for s ~ R[s]. Thus, the procedure getProof 
succeeds when, after invoking getTermConv(t1, Cpre, Cpost, B, R) with B and R ini- 
tially empty, the mapping R contains t as the converted form of tı. The proof 
for the equality tı ~ R[t:] can then be constructed by calling getProof on the 
lazy proof builder B, based on the (lazy) steps stored in it. 

Each subterm s of tı is traversed only once by getTermConv by checking 
whether R already contains the converted form of s. When that is not the case, 
s is first preorder processed. If Cpre contains an entry indicating that s rewrites 
to s’, this rewrite step is added to the lazy proof builder and the converted form 
R{s] of s is set to s’. Otherwise, the immediate subterms of s, if any, are traversed 
and then s is postorder processed. The converted form of s is set to some term 
r of the form f(R[si],...,R[sn]), considering how its immediate subterms were 
converted. Note that B will contain steps for 5 ~ R[s']. Thus, the equality s ~ r 
can be proven by congruence for function f with these premises if s 4 r, and by 
reflexivity otherwise. Furthermore, if Cpost indicates that r rewrites to r’, then 
this step is added to the lazy proof builder; a transitivity step is added to prove 
s x r' from tr and r ~ r’; and the converted form R[s] is set to r’. 


Example 1. Consider the equality t ~ L, where t = f(b)+ f(a) < f(a—0)+f(0), 
and suppose the conversion of t is justified by a term-conversion proof generator 
g containing the sets Cpre = {( f(b) + f(a) ~ f(a) + f(b), °°), (a—0 ~ a, g8") } 
and Cpost = {(f(a) + f(b) < f(a) + f(b) ~ L, gi")}. The generator g*© provides 


a proof based on associative and commutative reasoning, whereas gô" and 


gi" provide proofs based on arithmetic reasoning. Invoking getProof(g,t + L) 
initiates the traversal with getTermConv(t, Cpre, Cpost, Ø, Ø). Since t is not in the 
conversion map, it is preorder processed. However, as it does not occur in Cpre, 
nothing is done and its subterms are traversed. The subterm f(b) + f(a) is 
equated to f(a) + f(b) in Cpre, justified by g^“. Therefore R is updated with 
Ri f(b) + f(a)] = f(a) + f(b) and the respective lazy step is added to B. The 
subterms of f(b)+ f(a) are not traversed, therefore the next term to be traversed 
is f(a—0)+ f(b). Since it does not occur in Cpre, its subterm f(a—0) is traversed, 
which analogously leads to the traversal of a—0. As a—0 does occur in Cpre, both R 
and B are updated accordingly and the processing of its parent f(a—0) resumes. 
A congruence step added to B justifies its conversion to f(a) being added to R. 
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No more additions happen since f(a) does not occur in Cpost- Analogously, R and 
B are updated with f(b) not changing and f(a—0) + f(b) being converted into 
f(a) + f(b). Finally, the processing returns to the initial term t, which has been 
converted to R[ f(b) + f(a)] < R|f(a +0) + f(b), i.e., f(a) + f(b) < f(a) + f(b). 
Since this term is equated to L in Cpost, justified by gs", the respective lazy 
step is added to B, as well as a transitivity step to connect f(b) + f(a) < 
f(a—0) + FE) = f(a) + f) < fla) + FO) and f(a) + f(b) < f(a) + FO) ~ L. 
At this point, the execution terminates with R[f(b)+ f(a) < f(a+0)4+ f(b)] = L, 
as expected. A proof for t ~ L with the following structure can then be extracted 
from B: 


AC 


trans 


Lazy aK i fa+f® P| < | f(b) © Fb) 
a) f(a 1 fl N 
Po: CONE OFF < f(a—0) + f(b) © flay + FH) aO BT" O we FO) 
Arith 
ga Lazy Bd | f 

7 z 

OES ORS OES OER" Con’ — Fao mfa) Po | + 
( 


f(b) + f(a) < f(a = 0) A f(b) xl P, : cong 


f(a — 0) + f(b) ~ f(a) + f) 

We use several extensions to the procedures in Algorithm 1. Notice that this 
procedure follows the policy that terms on the right-hand side of conversion 
steps (equalities from Cpre and Cpost) are not traversed further. The procedure 
getTermConv is used by term-conversion proof generators that have the rewrite- 
once policy. A similar procedure which additionally traverses those terms is used 
by term-conversion proof generators that have a rewrite-to-fixpoint policy. 

We now show how the term-conversion proof generator can be used for recon- 
structing fine-grained proofs from coarse-grained ones. In particular we focus on 
proofs Py, of the form (sr, (Quo), (S,R,D,w)). Recall from Fig. 2 that the 
proof rule sr concludes a formula w~ that can be shown equivalent to the for- 
mula po proven by Q,, based on a substitution derived from the conclusions of 
the nodes Q. A proof like Py, above can be transformed to one that involves 
(atomic) theory rewrites and equality rules only. We show this transformation 
in two phases. In the first phase, the proof is expanded to: 


(eq-res, (Quo, (trans, (Ro, (symm, R1))))) 


with R; = (trans, (subs, Giz, (S, D, y;)), (rewrite, (), (R,S(wi,D()))))) for i € 
{0,1} where ¢ are the conclusions of Qe; and subs and rewrite are auxiliary proof 
rules used for further expansion in the second phase. We describe them next. 


Substitution Steps. Let Pixs be the subproof (subs, Qe; (S,D,t)) of R; above 
proving t ~ s with s = S(q;,D(¢)) and D(¥) = (ti > 51,...,tn |© Sn). Sub- 
stitution steps can be expanded to fine-grained proofs using a term-conversion 
proof generator. First, for each j = 1,...,n, we construct a proof of tj ~ sj, 
which involves simple transformations on the proofs of Ø. Suppose we store all 
of these in an eager proof generator g. If S is a simultaneous or fixed-point 
substitution, we then build a single term-conversion proof generator C, which 
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recall is modeled as a pair of mappings (Cpre, Cpost). We add (t; œ sj, g) to Cpre 
for all j. We use the rewrite-once policy for C if S is a simultaneous substi- 
tution, and the rewrite-fixed-point policy for C otherwise. We then replace the 
proof Pixs by getProof(C,t ~ s), which runs the procedure in Algorithm 1. 
Otherwise, if S is a sequential substitution, we construct a term-conversion 
generator Cj for each j, initializing it so that its Cpre set contains the single 
rewrite step (tj; œ% sj,g) and uses a rewrite-once policy. We then replace the 
proof Pixs by (trans, (Pi,...,P,)) where, for j = 1,...,n: Pj is generated by 
getProof(C;,s;-1 ~ sj); So = t; s; is the result of the substitution D(¢) after 
the first i steps; and Sn = s. 


Rewrite Steps. Let P be the proof node (rewrite, (), (R, t)), which proves the 
equality t = tT|z. During reconstruction, we replace P with a proof involving 
only fine-grained rules, depending on the rewrite method R. For example, if 
R is the core rewriter, we run the rewriter again on t in proof tracking mode. 
Normally, the core rewriter performs a term traversal and applies atomic rewrites 
to completion. In proof tracking mode, it also return two lists, for pre- and post- 
rewrites, of steps (tı © 51,9),-.--,(tn © Sn, g) where g is a proof generator that 
returns (atom_rewrite, (), (R,t;)) for all equalities t; ~ s;. Furthermore, for each 
Skolem k that is a subterm of t, we construct the rewrite steps (k ~ kT, g’) where 
g’ is a proof generator that returns (witness, (), (&)) for equalities k ~ kT. We 
add these rewrite proof steps to a term-conversion generator C with rewrite- 
fixed-point policy, and replace P by getProof(C,t ~ tT|p). 


5 SMT Proofs 


Here we briefly describe each component shown in Sect. 2 and how it produces 
proofs with the infrastructure from Sects. 3 and 3.2. 


5.1 Preprocessing Proofs 


The pre-processor transforms an input formula y into a list of formulas to be 
given to the core solver. It applies a sequence of preprocessing passes. A pass 
may replace a formula p; with another one ¢;, in which case it is responsible for 
providing a proof of p; ~ ¢;. It may also append a new formula ¢ to the list, 
in which case it is responsible for providing a proof for it. We use a (lazy) proof 
generator that tracks these proofs, maintaining the invariant that a proof can be 
provided for all (preprocessed) formulas when requested. We have instrumented 
proof production for the most common preprocessing passes, relying heavily on 
the sr rule to model transformations such as expansion of function definitions 
and, with witness forms, Skolemization and if-then-else elimination [6]. 


Simplification Under Global Assumptions. cvc5 aggressively learns literals that 
hold globally by performing Boolean constraint propagation over the input for- 
mula. When a learned literal corresponds to a variable elimination (e.g., x © 5 
corresponds to x ++ 5) or a constant propagation (e.g., P(x) corresponds to 
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P(x) + T), we apply the corresponding (term) substitution to the input. This 
application is justified via sr, while the derivation of the globally learned literals 
is justified via clausification and resolution proofs, as explained in Sect. 5.3. 
The key features of our architecture that make it feasible to produce proofs 
for this simplification are the automatic reconstruction of sr steps and the abil- 
ity to customize the strategy for substitution application during reconstruction, 
as detailed in Sect. 3.2. When a new variable elimination x + t is learned, old 
ones need to be normalized to eliminate any occurrences of x in their right-hand 
sides. Computing the appropriate simultaneous substitution for all eliminations 
requires quadratically many traversals over those terms. We have observed that 
the size of substitutions generated by this preprocessing pass can be very large 
(with thousands of entries), which makes this computation prohibitively expen- 
sive. Using the fixed-point strategy, however, the reconstruction for the sr steps 
can apply the substitution efficiently and its complexity depends on how many 
applications are necessary to reach a fix-point, which is often low in practice. 


5.2 Theory Proofs 


The theory engine produces lemmas, as disjunctions of literals, from an indi- 
vidual theory or a combination of them. In the first case, the lemma’s proof is 
provided directly by the corresponding theory solver. In the second case, a the- 
ory solver may produce a lemma w containing a literal Z derived by some other 
theory solver from literals Č. A lemma over the combined theory is generated by 
replacing @ in % by l. This regression process, which is similar to the computa- 
tion of explanations during solving, is repeated until the lemma contains only 
input literals. The proof of the final lemma then uses rules like sr to combine the 
proofs of the intermediate literals derived locally in various theories and their 
replacement by input literals in the final lemma. 


Equality and Uninterpreted Function (EUF) Proofs. The EUF solver can be 
easily instrumented to produce proofs [31,42] with equality rules (see Fig. 2). 
In cvc5, term equivalences are also derived via rewriting in some other theory 
T: when a function from T has all of its arguments inferred to be congruent to 
T-values, it may be rewritten into a T-value itself, and this equivalence asserted. 
Such equivalences are justified via sr steps. Since generating equality proofs 
incurs minimal overhead [42] and rewriting proofs are reconstructed lazily, EUF 
proofs are generated during solving and stored in an eager proof generator. 


Extensional Arrays and Datatypes Proofs. While these two theories differ sig- 
nificantly, they both combine equality reasoning with rules for handling their 
particular operators. For arrays, these are rules for select, store, and array exten- 
sionality (see [36, Sec. 5]). For datatypes, they are rules reflecting the properties 
of constructors and selectors, as well as acyclicity. The justifications for lemmas 
are also generated eagerly and stored in an eager proof generator. 


Bit- Vector Proofs. The bit-vector solver applies bit-blasting to reduce bit-vector 
problems to equisatisfiable propositional problems. Thus, its lemmas amount 
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to the rewriting of the bit-vector literals into Boolean formulas, which will be 
solved and proved by the propositional engine. The bit-vector lemmas are proven 
lazily, analogous to sr steps, with the difference that the reconstruction uses the 
bit-blaster in the bit-vector solver instead of the rewriter. 


Arithmetic Proofs. The linear arithmetic solver is based on the simplex algo- 
rithm [24], and each of its lemmas is the negation of an unsatisfiable conjunction 
of inequalities. Farkas’ lemma [30,49] guarantees that there exists a linear com- 
bination of these inequalities equivalent to L. The coefficients of the combination 
are computed during solving with minimal overhead [38], and the equivalence 
is proven with an sr step. To allow the rewriter to prove this equivalence, the 
bounds of the inequalities are scaled by constants and summed during recon- 
struction. Integer reasoning is proved through rules for branching and integer 
bound tightening, recorded eagerly. 

Non-linear arithmetic lemmas are generated from incremental linearization 
[16] or cylindrical algebraic coverings [1]. The former can be proven via propo- 
sitional and basic arithmetic rules, with only a few, such as the tangent plane 
lemma, needing a dedicated proof rule. The latter requires two complex rules 
that are not inherently simpler than solving, albeit not as complex as those for 
regular CAD-based theory solvers [2]. We point out that checking these rules 
would require a significant portion of CAD-related theory, whose proper formal- 
ization is still an open, if actively researched, problem [18,25,34,41,53). 


Quantifier Proofs. Quantified formulas not Skolemized during pre-processing are 
handled via instantiation, which produces theory lemmas of the form (YZ p) > 
yo, where o is a grounding substitution. An instantiation rule proves them 
independently of how the substitution was actually derived, since any well-typed 
one suffices for soundness. 


String Proofs. The strings solver applies a layered approach, distinguishing 
between core [40] and extended operators [48]. The core operators consist of 
(dis)equalities between string concatenations and length constraints. Reasoning 
over them is proved by a combination of equality and linear integer arithmetic 
proofs, as well as specific string rules. The extended operators are reduced to core 
ones via formulas with bounded quantifiers. The reductions are proven with rules 
defining each extended function’s semantics, and sr steps justifying the reduc- 
tions. Finally, regular membership constraints are handled by string rules that 
unfold occurrences of the Kleene star operator and split up regular expression 
concatenations into different parts. Overall, the proofs for the strings theory 
solver encompass not only string-specific reasoning but also equality, linear inte- 
ger arithmetic, and quantifier reasoning, as well as substitution and rewriting. 


Unsupported. The theory solvers for the theories of floating-point arithmetic, 
sequences, sets and relations, and separation logic are currently not proof- 
producing in cvc5. These are relatively new or non-standard theories in SMT 
and have not been our focus, but we intend to produce proofs for them in the 
future. 


28 H. Barbosa et al. 


Table 1. Cumulative solving times (s) on benchmarks solved by all configurations, 
with the slowdown versus CvC+5S in parentheses. 


Logics # CVC+OS CVC+S8 CVC+SP CVC+SPR 
NON-BVs 116,321 164k 166k 284k (1.7x) 299k (1.8x) 
BVs 29,192 45k 57k 150k (2.6x) 224k (3.9x) 


5.3 Propositional Proofs 


Propositional proofs justify both the conversion of preprocessed input formulas 
and theory lemmas into conjunctive normal form (CNF) and the derivation of 
L from the resulting clauses. CNF proofs are a combination of Boolean trans- 
formations and introductions of Boolean formulas representing the definition of 
Tseytin variables, used to ensure that the CNF conversion is polynomial. The 
clausifier uses a lazy proof builder which stores the clausification steps eagerly, 
with the preprocessed input formulas as assumptions, and the theory lemmas as 
lazy steps, with associated proof generators. 

For Boolean reasoning, cvc5 uses a version of MiniSat [27] instrumented to 
produce resolution proofs. It uses a lazy proof builder to record resolution steps 
for learned clauses as they are derived (see [7, Chap 1] for more details) and to 
lazily build a refutation with only the resolution steps necessary for deriving L. 
The resolution rule, however, is ground first-order resolution, since the proofs are 
in terms of the first-order clauses rather than their propositional abstractions. 


6 Evaluation 


In this section, we discuss an initial evaluation of our implementation in cvc5 of 
the proof-production architecture presented in this paper. In the following, we 
denote different configurations of cvc5 by Cvc plus some suffixes. A configuration 
using variable and clause elimination in the SAT solver [26], symmetry break- 
ing [23] in the EUF solver, and black-box SAT solving in the bit-vector (BV) 
solver, is denoted by the suffix o. These techniques are currently incompatible 
with the proof production architecture. Other cvc5 techniques for which we do 
not yet support fine-grained proofs, however, are active and have their inferences 
registered in the proofs as trusted steps. A configuration that includes simpli- 
fication under global assumptions is denoted by s; one that includes producing 
proofs by P; and one that additionally reconstructs proofs by R. The default 
configuration of cvc5 is CVC+OS8. 

We split our evaluation into measuring the proof-production cost as well 
as the performance impact of making key techniques proof-producing; the proof 
reconstruction overhead; and the coverage of the proof production. We also com- 
ment on how cvcd’s proofs compare with CVC4’s proofs. Note that the internal 
proof checking described in Sect. 3, which was invaluable for a correct implemen- 
tation, is disabled for evaluating performance. Experiments ran on a cluster with 
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Fig. 3. (a) Cactus plot for NON-BVs (b) Cactus plot for BVs (c) Scatter plot of overall 
proof cost (d) Reconstruction cost 


Intel Xeon E5-2620 v4 CPUs, with 300s and 8GB of RAM for each solver and 
benchmark pair. We consider 162,060 unsatisfiable problems from SMT-LIB [8], 
across all logics except those with floating point arithmetic, as determined by 
cvc5 [5, Sec. 4]. We split them into 38,732 problems with the BV theory (the 
BVS set) and 123,328 problems without (the NON-BVS set). 


Proof Production Cost. The cost of proof production is summarized in Table 1 
and Figs. 3a to 3d. The impact of running without o is negligible overall in NON- 
BVs, but steep for BVs, both in terms of solving time and number of problems 
solved, as evidenced by the table and Fig. 3b respectively. This is expected given 
the effectiveness of combining bit-blasting with black-box SAT solvers. The over- 
head of P is similar for both sets, although more pronounced in BVs. While the 
total time is around double that of Cvc+s, Fig. 3c shows a finer distribution, 
with most problems having a less significant overhead. Moreover, the total num- 
ber of problems solved is quite similar, as shown in Figs. 3a and 3b, particularly 
for NON-BVSs. The difference in overhead due to P between the BVs and NON- 
BVs sets can be attributed to the cost of managing large proofs, which are 
more common in BVs. This stems from the well-known blow-up in problem size 
incurred by bit-blasting, which is reflected in the proofs. 

The cost of generating fine-grained steps for the sr rule and for the similarly 
reconstructed theory-specific steps mentioned in Sect.5, varies again between 
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the two sets, but more starkly. While for NON-BVS the overall solving time and 
number of problems solved are very similar between CvC+SP and CvC+SPR, for 
the BVs set CVC+SPR is significantly slower overall. This difference again arises 
mainly because of the increased proof sizes. Nevertheless, R leads to only a small 
increase in unsolved problems in BVs, as shown in Fig. 3b. 

The importance of being able to produce proofs for simplification under 
global assumptions is made clear by Fig. 3a: the impact of disabling s is virtu- 
ally the same as that of adding P; moreover, CVC+SPR significantly outperforms 
cvc+PR. In Fig. 3b the difference is less pronounced but still noticeable. 


Proofs Coverage. When using techniques that are not yet fully proof-producing, 
but still active, cvc5 inserts trusted steps in the proof. These are usually steps 
whose checking is not inherently simpler than solving. They effectively represent 
holes in the proof, but are still useful for users who avail themselves of powerful 
proof-checking techniques. Trusted steps are commonly used when integrating 
SMT solvers into proof assistants [11, 28,51]. 

The percentage of CVC+SPR proofs without trusted steps is 92% for BVs and 
80% for NON-BVs. That is to say, out of 145,683 proofs, 120,473 of them are 
fully fine-grained proofs. The vast majority of the trusted steps in the remaining 
proofs are due to theory-specific preprocessing passes that are not yet fully proof- 
producing. In NON-BVs, the occurrence of trusted steps is heavily dependent 
on the specific SMT-LIB logic, as expected. Common offenders are logics with 
datatypes, with trusted steps for acyclicity checks, and quantified logics, with 
trusted steps for certain a-equivalence eliminations. In non-linear real arithmetic 
logics, all cylindrical algebraic coverings proofs are built with trusted steps (see 
Sect. 5.2), but we note this is the state of the art for CAD-based proofs. As for 
non-linear integer arithmetic logics, our proof support is still in its early stages, 
so a significant portion of their theory lemmas are trusted steps. 

We stress the extent of our coverage for string proofs, which were previously 
unsupported by any SMT solver. In the string logics without length constraints, 
100% of the proofs are fully fine-grained. This rate goes down to 80% in the 
logics with length. For the remaining 20%, the overwhelming majority of the 
trusted steps are for theory-specific preprocessing or some particular string or 
linear arithmetic inference within the proof of a theory lemma. 


Comparison with CVC4 Proofs. We compare the proof coverage of cvc5 versus 
CVC4. The cvc5 proof production replaces CVC4’s [32,36], which was incom- 
plete and monolithic. CVC4 did not produce proofs at all for strings, substitu- 
tions, rewriting, preprocessing, quantifiers, datatypes, or non-linear arithmetic. 
In particular, simplification over global assumptions had to be disabled when 
producing proofs. In fragments supported by both systems, CVC4’s proofs are 
at most as detailed as cvc5’s. The only superior aspect of CVC4’s proof produc- 
tion was to support proofs from external SAT solvers [45] used in the BV solver, 
which are very significant for solving performance, as shown above. Integrating 
this feature into cvc5 is left as future work, but we note that there is no limi- 
tation in the proof architecture that would prevent it. We also point out that 
cvc5 produces resolution proofs for the bit-blasted BV constraints, which can 
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be checked in polynomial time, whereas external SAT solvers produce DRAT 
proofs [33] (or reconstructions of them via other tools [19,20,37,39]), which can 
take exponential time to check. So there is a significant trade-off to be considered. 


7 Related Work 


Two significant proof-producing state-of-the-art SMT solvers are z3 [22] and 
veriT [14]. Both can have their proofs successfully reconstructed in proof assis- 
tants [3,12,13,51]. They can produce detailed proofs for the propositional and 
theory reasoning in EUF and linear arithmetic, as well as for quantifiers. How- 
ever, z3’s proofs are coarse-grained for preprocessing and rewriting, and for bit- 
vector reasoning, which complicates proof checking. Moreover, to the best of our 
knowledge, z3 does not produce proofs for its other theories. In contrast, veriT 
can produce fine-grained proofs for preprocessing and rewriting [6], which has led 
to a better integration with Isabelle/HOL [51]. However, it does so eagerly, which 
requires a tight integration between the preprocessing and the proof-production 
code. In addition, it does not support simplification under global assumptions 
when producing proofs, which significantly impacts its performance. Other proof- 
producing SMT solvers are MathSAT5 [17] and SMTInterpol [15]. They produce 
resolution proofs and theory proofs for EUF, linear arithmetic, and, in SMTIn- 
terpol’s case, array theories. Their proofs are tailored towards unsatisfiable core 
and interpolant generation, rather than external certification. Moreover, they do 
not seem to provide proofs for preprocessing, clausification or rewriting. 

While cvc5 is possibly the only proof-producing solver for the full theory of 
strings, CERTISTR [35] is a certified solver for the fragment with concatenation 
and regular expressions. It is automatically generated from Isabelle/HOL [44] 
but is significantly less performant than cvc5, although a proper comparison 
would need to account for proof-checking time in cvc5’s case. 


8 Conclusion and Future Work 


We presented and evaluated a flexible proof production architecture, showing it 
is capable of producing proofs with varying levels of granularity in a scalable 
manner for a state-of-the-art and industrial-strength SMT solver like cvc5. 
Since currently, there is no standard proof format for SMT solvers, our archi- 
tecture is designed to support multiple proof formats via a final post-processing 
transformation to convert internal proofs accordingly. We are developing back- 
ends for the LFSC [52] proof checker and the proof assistants Lean 4 [21], 
Isabelle/HOL [44], and Coq [10], the latter two via the Alethe proof format [50]. 
Since using these tools requires mechanizing the respective target proof calculi in 
their languages, besides external checking, another benefit is to decouple confi- 
dence on the soundness of the proof calculi from the internal cvc5 proof calculus. 
A considerable challenge for SMT proofs is the plethora of rewrite rules used 
by the solvers, which are specific for each theory and vary in complexity. In 
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particular, string rewrites can be very involved [46] and hard to check. We are 
also developing an SMT-LIB-based DSL for specifying rewrite rules, to be used 
during proof reconstruction to decompose rewrite steps in terms of them, thus 
providing more fine-grained proofs for rewriting. 


Finally, we plan to incorporate into the proof-production architecture the 


unsupported theories and features mentioned in Sects. 5.2 and 6, particularly 
those relevant for solving performance that currently either leave holes in proofs, 
such as theory pre-processing or non-linear arithmetic reasoning, or that have 
to be disabled, such as the use of external SAT solvers in the BV theory. 
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Abstract. The analysis of complex dynamic systems is a core research 
topic in formal methods and AI, and combined modelling of systems with 
data has gained increasing importance in applications such as business 
process management. In addition, process mining techniques are nowa- 
days used to automatically mine process models from event data, often 
without correctness guarantees. Thus verification techniques for linear 
and branching time properties are needed to ensure desired behavior. 

Here we consider data-aware dynamic systems with arithmetic 
(DDSAs), which constitute a concise but expressive formalism of tran- 
sition systems with linear arithmetic guards. We present a CTL* model 
checking procedure for DDSAs that addresses a generalization of the 
classical verification problem, namely to compute conditions on the ini- 
tial state, called witness maps, under which the desired property holds. 
Linear-time verification was shown to be decidable for specific classes 
of DDSAs where the constraint language or the control flow are suit- 
ably confined. We investigate several of these restrictions for the case of 
CTL", with both positive and negative results: witness maps can always 
be found for monotonicity and integer periodicity constraint systems, 
but verification of bounded lookback systems is undecidable. To demon- 
strate the feasibility of our approach, we implemented it in an SMT-based 
prototype, showing that many practical business process models can be 
effectively analyzed. 


Keywords: Verification - CTL* - Counter systems - Constraints - 
SMT 


1 Introduction 


The study of complex dynamic systems is a core research topic in AI, with a long 
tradition in formal methods. It finds application in a variety of domains, such 
as notably business process management (BPM), where studying the interplay 
between control-flow and data has gained momentum [9, 10,24,46]. Processes are 
increasingly mined by automatic techniques [1,3] that lack any correctness guar- 
antees, making verification even more important to ensure the desired behavior. 
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However, the presence of data pushes verification to the verge of undecidability 
due to an infinite state space. This is aggravated by the use of arithmetic, in spite 
of its importance for practical applications [24]. Indeed, model checking of tran- 
sition systems operating on numeric data variables with arithmetic constraints 
is known to be undecidable, as it is easy to model a two-counter machine. 

In this work, we focus on the concise but expressive framework of data-aware 
dynamic systems with arithmetic (DDSAs) [28,38], also known as counter sys- 
tems [13, 20,34]. Several classes of DDSAs have been isolated where specific ver- 
ification tasks are decidable, notably reachability [6,13,29,34] and linear-time 
model checking [14,20,22,28,38]. Fewer results are known about the case of 
branching time, except for flat counter systems [21], gap-order systems where 
constraints are restricted to the form —y > 2 [8,42], and systems with a 
nice symbolic valuation abstraction [31]. However, many processes in BPM and 
beyond fall into neither of these classes, as illustrated by the next example. 


Example 1. The following DDSA 8 models a management process for road fines 
by the Italian police [41]. It maintains seven so-called case data variables (i.e., 
variables local to each process instance, called “case” in the BPM literature): a 
(amount), t (total amount), d (dismissal code), p (points deducted), e (expenses), 
and time durations ds, dp, dj. The process starts by creating a case, upon which 
the offender is notified within 90 days, i.e., 2160h (send fine). If the offender pays 
a sufficient amount t, the process terminates via silent actions T1, T2, or 73. For 
the less happy paths, the credit collection action is triggered if the payment was 
insufficient; while appeal to judge and appeal to prefecture reflect filed protests 
by the offender, which again need to respect certain time constraints. 

payment payment add penalt 

ezo 30 panes il 


insert notification Ste 


T5 
d’=0 
appeal to judge 
0< dj” < 1440 A d” >0 


create fine send fine 


A aA a aA) 


t?>a"+e" 


credit collection appeal to prefecture 


0< dp” < 1440 
Ti 
dZ0V (p=0At" >a") 


result prefecture send to prefecture 


T4 


d= 


This model was generated from real-life logs by automatic process mining tech- 
niques paired with domain knowledge [41], but without any correctness guar- 
antee. For instance, data-aware soundness [4,25] requires that the process can 
always reach a final state from any reachable configuration, expressed by the 
branching-time property AGEFend. This property is false here, as $ can get 
stuck in state p7 if d>1. In addition, process-specific linear-time properties are 
needed, e.g., that a send fine event is always followed by a sufficient payment (i.e., 
(send fine) T — F (payment) (t > a), where (a) is the next operator via action a). 


38 P. Felli et al. 


This example highlights how both linear-time and branching-time verifica- 
tion are needed. In this paper, we present a CTL* model checking algorithm for 
DDSAs, adopting a finite-trace semantics (CTL*) [44] to reflect the nature of 
processes as in Example 1. More precisely, our approach can synthesize condi- 
tions on the initial variable assignment such that a given property x holds, called 
witness maps. If such a witness map can be found, it is in particular decidable 
what is more commonly called the verification problem, namely whether x is sat- 
isfied in a designated initial configuration. We derive an abstract criterion on the 
computability of witness maps, which is satisfied by two practical DDSA classes 
that restrict the constraint language to (a) monotonicity constraints [20,25], i.e., 
variable-to-variable or variable-to-constant comparisons over Q or R, and (b) 
integer periodicity constraints [18,22], i.e., variable-to-constant and restricted 
variable-to-variable comparisons with modulo operators. On the other hand, 
we show that the verification problem is undecidable for bounded lookback sys- 
tems [28], a control flow restriction that generalizes feedback freedom [14]. 

In summary, we make the following contributions: 


1. We present a model checking algorithm to generate a witness map for a given 
DDSA and CTL} property; 

2. We prove an abstract termination criterion for this algorithm (Corollary 1); 

3. This result is used to show that witness maps can be effectively computed for 
monotonicity constraint and integer periodicity constraint systems; 

4. CTL} verification is shown undecidable for bounded-lookback systems; 

5. We implemented our approach in the prototype ada using SMT solvers as 
backends and tested it on a range of business processes from the literature. 


The paper is structured as follows: The rest of this section recapitulates related 
work. Section2 compiles preliminaries about DDSAs and CTL}. Section 3 is 
dedicated to LTL with configuration maps, which is used by our model checking 
procedure in Sect. 4. Based on an abstract termination criterion, (un)decidability 
results for concrete DDSA classes are given in Sect. 5. We describe our imple- 
mentation in Sect. 6. Complete proofs and further examples can be found in [27]. 


Related work. Verification of transition systems with arithmetic constraints, also 
called counter systems, has been studied in many areas including formal meth- 
ods, database theory, and BPM. Reachability was proven decidable for a variety 
of classes, e.g., reversal-bounded counter machines [34], finite linear [29], flat [13], 
and gap-order constraint (GC) systems [6]. Considerable work has also been 
dedicated to linear-time verification: LTL model checking is decidable for mono- 
tonicity constraint (MC) systems [20]. LTL verification is also decidable for inte- 
ger periodicity constraint (IPC) systems, even with past-time operators [18,22]; 
and feedback-free systems, for an enriched constraint language referring to a 
read-only database [14]. DDSAs with MCs are also considered in [25] from the 
perspective of LTL with a finite-run semantics (LTL+), giving a procedure to 
compute finite, faithful abstractions. LTLy is moreover decidable for systems 
with the abstract finite summary property [28], which includes MC, GC, and 
systems with bounded lookback, where the latter generalizes feedback freedom. 
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Branching-time verification was less studied: Decidability of CTL* was 
proven for flat counter systems with Presburger-definable loop iteration [21], 
even in NP [19]. Moreover, it was shown that CTL* verification is decidable for 
pushdown systems, which can model counter systems with a single integer vari- 
able [30]. For integer relational automata (IRA), i.e., systems with constraints 
x> yor zx>y and domain Z, CTL model checking is undecidable while the exis- 
tential and universal fragments of CTL* remain decidable [12]. For GC systems, 
which extend IRAs to constraints of the form z — y > k, the existential fragment 
of CTL* is decidable while the universal one is not [8]. A similar dichotomy holds 
for the EF and EG fragments of CTL [42]. A subclass of IRAs was considered 
in [7,11], allowing only periodicity and monotonicity constraints. While satisfi- 
ability of CTL* was proven decidable, model checking is not (as already shown 
n {12]), though it is decidable for CEF* properties, an extension of the EF 
fragment [7]. In contrast, rather than restricting temporal operators, we show 
decidability of model checking under an abstract property of the DDSA and 
the verified property, which can be guaranteed by suitably constraining the con- 
straint class or the control flow. More closely related is work by Gascon [31], who 
shows decidability of CTL* model checking for DDSAs that admit a nice sym- 
bolic valuation abstraction, an abstract property which includes MC and IPC 
systems. The relationship between our decidability criterion and the property 
defined by Gascon will need further investigation. Another difference is that we 
here adopt a finite-path semantics for CTL* as e.g. considered in [47], since for 
the analysis of real-world processes such as business processes it is sufficient to 
consider finite traces. On a high level, our method follows a common approach 
to CTL*: the verification property is processed bottom-up, computing solutions 
for each subproperty. These are then used to solve an equivalent linear-time 
problem [2, p. 429]. For the latter, we partially rely on earlier work [28]. 


2 Background 


We start by defining the set of constraints over expressions of sort int, rat, or 
real, with associated domains dom(int) = Z, dom(rat) = Q, and dom(real) = R. 


Definition 1. For a given set of sorted variables V, expressions es of sort s and 
atoms a are defined as follows: 


€s-= Us | ks | Ees + €s | €s — €s Q := es = €s | es < €s | es Ses | Cint =n Cint 


where ks € dom(s), v;€V has sort s, and =, denotes equality modulo some 
n EN. A constraint is then a quantifier-free boolean expression over atoms a. 


The set of all constraints built from atoms over variables V is denoted by C(V). 
For instance, x Æ 1, x < y— z, and x—-y = 2 ^y Æ 1 are valid constraints 
independent of the sort of {x,y,z}, while u =3 v + 1 is a constraint for integer 
variables u and v. We write Var(p) for the set of variables in a formula y. For 
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an assignment a with domain V that maps variables to values in their domain, 
and a formula y we write a F y if a satisfies y. 

We are thus in the realm of SMT with linear arithmetic, which is decidable 
and admits quantifier elimination [45]: if y is a formula in C(X U {y}), thus 
having free variables X U {y}, there is a quantifier-free y’ with free variables X 
that is equivalent to 4y.y, i.e., Y'= Jy., where = denotes logical equivalence. 


2.1 Data-Aware Dynamic Systems with Arithmetic 


From now on, V is a fixed, finite set of variables. We consider two disjoint, 
marked copies of V, denoted V” = {v" | ve V} and V” = {v” | vEV}, called 
the read and write variables. They will refer to variable values before and after a 
transition, respectively. We also write V for a vector that orders V in an arbitrary 
but fixed way, and V' and V” for vectors ordering V” and V™ in the same way. 


Definition 2. A DDSA B = (B,b;,A,T, BF, V, œz, guard) is a labeled transi- 
tion system where (i) B is a finite set of control states, with br € B the initial 
one; (it) A is a set of actions; (iii) T C BXAxB is a transition relation; (iv) 
Br C B are final states; (v) V is the set of process variables; (vi) œar the ini- 
tial variable assignment; (vii) guard: A œ C(V" UV™) specifies executability 
constraints for actions over variables V"UV™. 


Example 2. We consider the following DDSAs B, By, and Bipc, where x, y have 
domain Q and u, v, s have domain Z. Initial and final states have incoming 
arrows and double borders, respectively; a; is not fixed for now. 


ao: [x ole [v an ay: "A = v"| 


‘Ou fy” >a az: [c” = y"] © ER =u" ge =8" +0" o% ger [u =u" Au 10 


[u “=O0Av™ = 0] (J az: [v™ =. 


Also the system in Example 1 represents a DDSA. If state b admits a transition to 
b via action a, namely (b,a,b’) € A, this is denoted by b ©, b’. A configuration 
of B is a pair (b,a) where bE B and a is an assignment with domain V. A 
guard assignment is an assignment 3 with domain V” UV”. For an action a, 
let write(a) = Var(guard(a)) N V”. As defined next, an action a transforms a 
configuration (b, a) into a new configuration (b’, a’) by updating the assignment 
a according to the action guard, which can at the same time evaluate conditions 
on the current values of variables and write new values: 


Definition 3. A DDSA B=(B,b;,A,T, Br,V,ar, guard) admits a step from 
configuration (b,a) to (b',a") via action a, denoted (b,a) %, (b’,a’), ifb 4 VU, 
a’(v) = a(v) for allv € V \ write(a), and the guard assignment B given by 
Biv") = a(v) and B(w”) = a' (v) for allu € V, satisfies B — guard(a). 


For instance, for B in Example 2 and initial assignment a ;(x) = a;(y) = 0, the 
initial configuration admits a step (b1, ies 21, (ba, Fs 3) with B(x") = B(a”) = 
B(y") = 0 and f(y") = 3. 
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A run p of a DDSA B of length n from configuration (b, œ) is a sequence of 
steps p: (b,a) = (bo, ao) £5 (b1,01) £3 ... 2%, (bn, Qn). We also associate with 
p the symbolic run o: bo £ bi £3 ... 22, bp where state and action sequences 
are recorded without assignments, and say that ø is the abstraction of p (or, o 
abstracts p). For some m < n, o|m denotes the prefix of o that has m steps. 


2.2 History Constraints 


In this section, we fix a DDSA B = (B,b;,A,T, Br,V,a;, guard). We aim to 
build an abstraction of B that covers the (potentially infinite) set of configura- 
tions by finitely many nodes of the form (b, p), where b€ B is a control state 
and y a formula that expresses conditions on the variables V. A state (b, y) 
thus represents all configurations (b,a) s.t. a = p. To express how such a for- 
mula y is modified by executing an action, let the transition formula of action 
abe AV , V”) = guard(a) A Nvev\urite(a) V” =V". This states conditions on 
variables before and after executing a: guard(a) must hold and the values of all 
variables that are not written are propagated by inertia. We write Aa( X,Y) for 
the formula obtained from A, by replacing V by X and y” by Y. Let a variable 
vector U be a fresh copy of V if it has the same length as |V| and UNV = 0. 
To mimic steps on the abstract level, we define the following update function: 


Definition 4. For a formula p with free variables V and action a, 
update(y,a) = JU.p(U) A A.(U,V), where U is a fresh copy of V. 


Our approach will generate formulas of a special shape called history con- 
straints [28], obtained by iterated update operations in combination with a 
sequence of verification constraints 0. Intuitively, the latter depends on the ver- 
ification property. For now it suffices to consider J an arbitrary sequence of con- 
straints with free variables V. Its prefix of length k is denoted by V|. We need 
a fixed set of placeholder variables Vo disjoint from V, and assume an injective 
variable renaming v: V > Vo. Let p, be the formula py = \,cy v= v (v). 


Definition 5. For a symbolic run o: bo & by ©, ... =, bn, and verification 
constraint sequence V = (Vo,...,0n), the history constraint h(c,¥) is given by 
h(o, 0) =p, AV ifn=0, and h(a, ð) = update(h(o|n—1, Uln—1); An) AUn ifn >0. 


Thus, history constraints are formulas with free variables V U Vo. Satisfying 
assignments for history constraints are closely related to assignments in runs:! 


Lemma 1. For a symbolic run a: bo 44 by £25... 22, bn and V = (vo, ..., On), 
h(o,0) is satisfied by assignment a with domain VUVo iff o abstracts a run 
p: (bo, ao) £s ... £25 (bn, Qn) such that (i) ao(v) = a(v(v)), and (ti) an(v) = 
a(v) for allu € V, and (iti) a; = v; for alli, O<i<n. 


1 Lemma 1 is a slight variation of [28, Lemma 3.5]: Definition 5 differs from history 
constraints in [28] in that the initial assignment is not fixed. A proof can be found 
in [27]. 
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2.3 CTL} 
For a DDSA B as above, we consider the following verification properties: 


Definition 6. CTL; state formulas x and path formulas w are defined by the 
following grammar, for constraints cE C(V) and control states be B: 


x:=T]|c|blxax|oxlEy wv:=x|vAY|-7Y|XY|Gy|YPpUY 


We use the usual abbreviations Fw = T U Ww, v1 V x2 = 7(7x1 A 7X2), and 
Aw = ~E ~y. To simplify the presentation, we do not explicitly treat next state 
operators (a) via a specific action a, as used in Example 1, though this would be 
possible (cf. [28]). However, such an operator can be encoded by adding a fresh 
data variable x to V, the conjunct x” =1 to guard(a), and x” =0 to all other 
guards, and replacing (a)q in the verification property by X (Ww A x = 1). 

The maximal number of nested path quantifiers in a formula w is called the 
quantifier depth of p, denoted by gd(w). We adopt a finite path semantics for 
CTL* [44]: For a control state b € B and a state assignment a, let FRuns(b, a) 
be the set of final runs p: (b,a) = (bo, ao) “4... 22, (bn, an) such that bn € F 
is a final state. The i-th configuration (b;,a;) in p is denoted by pi. 


Definition 7. The semantics of CTL; is inductively defined as follows. For a 
DDSA B with configuration (b,a), state formulas x, x’, and path formulas Y, w": 


(b,a) ET 

(b,a) Ee fac 

(ba) HEY ifb=b 

(b,a) EX Ax! iff (b,a) = x and (b,a) =x 

(ba) ==x iff (b,a) Ex 

(b,a) EE Ew iff do E€ FRuns(b, a) such that p = w 


where p = w iff p,0 || w holds, and for a run p of length n and alli, O<i<n: 


pikx uf pi EX 
pRiewaw iff p,i | Y and p,i H Y 
p,i = Xy iffi <n andp,i +1 H4 
p,i = Gy iff for allj, i < j < n, it holds that p, j Ew 
pi H= yU if akwithi+k< n such that p, i+ k Ew 
and for all j,0 < j < k, it holds that p,i + j E w. 


Instead of simply checking whether the initial configuration of a DDSA 6 
satisfies a CTL} property X, we try to determine, for every state b € B, which 
constraints on variables need to hold in order to satisfy x. As the number of 
configurations (b,a) of a DDSA B is usually infinite, configuration sets cannot 
be enumerated explicitly. Instead, we represent a set of configurations as a con- 
figuration map K: B+> C(V) that associates with every control state b € Ba 
formula K(b) € C(V), representing all configurations (b, œa) such that a = K(b). 

We now define when a configuration captures the maximal set of configura- 
tions in which a formula x holds. We call these witness maps. 
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Definition 8. For a DDSA B and state formula x, a configuration map K is a 
witness map if it holds that (b,a) = x if a | K(b), for all bE B and alla. 


For instance, for B from Example 2 and xı = AG (z > 2), a witness map is given 
by K = {bı > L, bg x>2Ay>2, bg rH x > 2}. For yo EK EX (AG (z > 2)), 
a solution is K’ = {by œ> z > 2, bə > y>2, ba > L}. As by is the initial state, 
B satisfies y2 with every initial assignment that sets a;(x) > 2. 

In this paper we address the problem of finding a witness map for B and y. 
Note that a witness map in particular allows to decide what is commonly called 
the verification problem, namely to check whether (br, ar) = x holds, by testing 
ar H K(bz). It remains to investigate whether there exist a DDSA B and y for 
which no witness map exists, as the configuration set satisfying y is not finitely 
representable. Even if it exists, finding it is in general undecidable. However, in 
this paper we identify DDSA classes where a witness map can always be found. 


3 LTL with Configuration Maps 


Following a common approach to CTL” verification, our technique processes 
the property x bottom-up, computing solutions for each subformula Ew, before 
solving a linear-time model checking problem y’ in which the solutions to subfor- 
mulas appear as atoms. Given our representation of sets of configurations, we use 
LTL formulas where atoms are configuration maps, and denote this specification 
language by LTL? . For a given DDSA B, it is formally defined as follows: 


v= K|yary| yI Xy|Gy| y Uy 
where K € Kg, for Kg is the set of configuration maps for B. 


Definition 9. A run p of length n satisfies an LTL? formula w, denoted p Fx 
Y, iff p,0 Ex w holds, where for alli, O<i<n: 


Dien K iff pi = (b,a) anda = K(b); 

pi Er pay if p, i =x Y and p, i Ex WY; 

p.i K a uff p,i EK p; 

pike Xy iffi< n andp,i+1 Ex Y; 

p:i =k Gy iffp,i ecw and (i= n or p,i+1 Hx G4); 

p:i |x Y U Y iff p,i Hx Y or(i<n and p, i Ex Y and p,i+1 Ex Y U y). 


Our approach to LT L verification proceeds along the lines of the LTLy 
procedure from [28], with the difference that simple constraint atoms are replaced 
by configuration maps. In order to express the requirements on a run of a DDSA 
B to satisfy an LT L? formula x, we use a nondeterministic automaton (NFA) 
Ny = (Q, X, 0,40, QF), where the states Q are a set of subformulas of Y, X = 2Ks 
is the alphabet, o is the transition relation, go € Q is the initial state, and Qr C 
Q is the set of final states. The construction of Ny is standard [15,28], treating 
configuration maps for the time being as propositions; but for completeness it is 
described in [27, Appendix C]. For instance, for a configuration map K, y = FK 
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corresponds to the NFA and y'= XK to -@—-®~©». (For 
simplicity, edges labels {K} are shown as K, and edge labels Ø are omitted.) 
For w; € X, i.e., w; is a set of configuration maps, w;(b) denotes the formula 
Axew (0). Moreover, for w = wo,..., Wn E€ X* and a symbolic run a: bo 44 
bi ©, ... 2, bn, let w@o denote the sequence of formulas (wo(bo),.--,Wn(bn)), 
e., the component-wise application of w to We control a of o. A word 
Wo,-++;Wn E X* is consistent with a run (bo, ao) £5 (b1,01) £3 ... £% (bn, an) 
if a; H w,(b;) for all i, 0<i<n. The key correctness ae of Ny is the 
following (cf. [28, Lemma 4.4], and see [27] for the proof adapted to LIL? ): 


Lemma 2. Ny accepts a word that is consistent with a run p iff p =x Y. 


Product Construction. As a next step in our verification procedure, given a con- 
trol state b of B, we aim to find (a symbolic representation of) all configurations 
(b,a) that satisfy an LTL? come wv. To that end, we combine Ny with B 
to a cross-product naiai NY gy: For technical reasons, when performing the 
product construction, the steps in B need to be shifted by one with respect to the 
steps in My. Hence, given bE B, let B, be the DDSA obtained from B by adding 
a dummy initial state b, so that B, has state set B’ = BU {b} and transition 
relation T” = T U {(b, ao, b)} for a fresh action ap with guard(ao) = T. 


Definition 10. The product automaton Ng. » ts defined for an LTL? formula 
p, a DDSA B, and a control state b € B. Let B, = (B’,b, A, T', Br, V, guard) 
and Ny as above. Then Ne = (P, R, po, Pr) is as follows: 


e PCB xQxC(VUNW), i.e., states in P are triples (b,q, p) such that 

e the initial state is po = (b, qo, Yr); 

e ifb, b in T', q & q in Ny, and update(y, a) \w(v’) is satisfiable, there is 
a transition (b,q, p) 2”, (b',q', p") in R such that y’ = update(y, a) A w(b'); 

e (b',q',¢’) is in the set of final states Pr C P iff b' € Br, and d' € QF. 


Example 3. Consider the DDSA B from Example 2, and let K = {bı > L, bə => 
r>2Ay>2, bg ++ x>2}. The property Y = XK is captured by the NFA 
-0O O. The product automata Nei and Ne b, are as follows: 


—(o]b[e=20 Ay=yo) 
ay y 
(b1 | K]2=20 Ay=yo) (b]b]c=20 Ay=yo) 
ayk 
(b2 | T|t@=a%o Ar>2Ay>2) az az 
a3 ap 
(bs | T]@=20=y Ax >2)(b2[ Tle >yAy>2A20>2) (bs[T]@=20=y=40 Ayo >2) (b2|T |y=y Ar>yAy>?2) 


ash ash 
(bs T]e@=yAy>2A 20>2) (bs | T]@=y=yo Ayo >2) 


where the shaded nodes are final. The formulas in nodes were obtained by apply- 
ing quantifier elimination to the formulas built using update according to Defi- 
nition 10. M; H p, Consists only of the dummy transition and has no final states. 


CTL* Model Checking for Data-Aware Dynamic Systems with Arithmetic 45 


Definition 10 need not terminate if infinitely many non-equivalent formulas 
occur in the construction. In Sect. 4 we will identify a criterion that guarantees 
termination. First, we state the key correctness property, which lifts [28, Theorem 
4.7| to LTL with configuration maps. Its proof is similar to the respective result 
in [28], and can be found in [27]. 


Theorem 1. Let we LIL? and b€ B such that there is a finite product automa- 
ton NG . Then there is inal run p: (b,a0) >* (br, ar) of B such that p =x Y, 
aff Nes i ibe a final state (br, qr,) for some qr and » such that » is satisfied 
by assignment y with y(Vo)=ao0(V) and y(V)=ar(V). 


Thus, witnesses for ~ correspond to paths to final states in the product 
automaton: e.g., in Ne B.b, 1n Example 3 the formula in the left final node is satis- 
fied by y(29) = y(x) = (y ) = 3 and 7(yo) = 0. For ao and a2 such that ag(V) = 
y(Vo) = {x = 3,y 0} and a2(V a mi ) = {x + 3,y + 3} there is a witness 
run for ọ% from (bi, ao) to (b1, 2), e.g., (b1, |7=5]) 25 (be, [=3]) 2% (bs, [7=3))- 


4 Model Checking Procedure 


Using the results of Sect.3, we define a model checking procedure, shown in 
Fig. 1. First, we explain the tasks achieved by the three mutually recursive func- 
tions: 

e checkState(x) returns a configuration map representing the set of config- 
urations that satisfy a state formula y. In the base cases, it returns a function 
that checks the respective condition, for boolean operators we recurse on the 
arguments, and for a formula Ew we proceed to the checkPath procedure. 

e checkPath(wW) returns a configuration map K that represents all configura- 
tions from which a path satisfying ~ exists. First, toLTIx is used to obtain an 
equivalent LTL? formula y’ (which entails the computation of solutions for all 
subproperties En). Then solution K is constructed as follows: For every control 
state b, we build the product automaton N, A and collect the set Pp of formu- 
las in final states. Every y E€ p encodes runs from b to a final state of B that 
satisfy 7)’. The variables Vo and V in ¢ act as placeholders for the initial and the 
final values of the runs, respectively. We rename variables in ọ to use V at the 
start and U at the end, we quantify existentially over U (as the final valuation 
is irrelevant), and take the disjunction over all y € p. The resulting formula 
y’ encodes all final runs from b that satisfy Y’, so we set K(b) := gy’. 

e toLTLx(w) computes an LT LË formula equivalent to a path formula w. To 
this end, it performs two kinds of replacements in w: (a) T, b€ B, and constraints 
c are represented as configuration maps; and (b) subformulas E7 are replaced 
by their solutions Keņ, which are computed by a recursive call to checkPath. 

To represent the base cases of formulas as configuration maps in Fig. 1, we 
define Ky := (A..T), Ky := (Ab'.b=b'? T : L) for all bE B, and Ke := (A..c) for 
constraints c. We also eae ~K for (Ab.nK(b)) and KAK’ for (Ab. (b) ^A K'(b)). 
The next example illustrates the approach. 
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case xı A x2: 
case 7X: 
case Ew: 


procedure checkPath(w) 
ay! := toLTLK(W) 
for b € B do ; 
(P, R, po, Pr) = NZ, 
$ := {ọ | (br, ar, p) € Pr} 
K(b):= Voc U-0(V,0) 


return K 


procedure toLTLx (vw) 
switch yw do 

case T, b€ B, or c€ cC: return Ky 
return toLTLg (y1) A toLTLx (42) 
return =toLTLx (4) 
return checkPath(w) 
return X toLTLx(w) 
return G toLTLx(w) 
return toLTLc(wv1) U toLTLx (42) 


case U1 A wa: 
case =y): 
case Ew: 
case Xy: 
case Gw: 
case Wi U y2: 


procedure checkState(x) 
switch x do 
case T, b€ B, orc €C: return Ky 
return checkState(x1) A checkState (x2) 
return ~checkState(x) 
return checkPath(w) 


> product automaton for w’, B, and b 


> collect formulas in final states 


Fig. 1. Model checking procedure. 


Example 4. Consider x =EX (AG (x >2)) and the DDSA B in Example 2. To 
get a solution K; to checkState(x) = checkPath(%1) for Yı = X (AG (z > 2)), we 
first compute an equivalent LTL? formula 7, = X K2, where Ko is a solution to 
AG (a > 2) = AEF (a < 2). To this end, we run checkPath(w2) for Y2 = F (a < 2), 
which is represented in LTL® as y} = F (Ky<2) with NFA . Next, 
checkPath builds NS for all states b. For instance, for be we get: 


— "bo ws | v=o Ny=yo) 
aa 


we <2 


(b> |v 


2 | Ya |£ =£ AN Y=Yo g P2|T|1=t0Ay=w^r<2 
g<2 
Y Y 
Tly=yA2>ax2>y (bs T xr=g£0= yo =Y A 2 <2) ¥1 


Yy 


h d 


T y=y n2 >y nry) b2 Tly=y ArzyAx%<2 


Y 
(bs |T L=y=yo Ny<2) p2 


Y 
(bs T[2=y=yo A a <2) ¢3 


where dashed arrows indicate transitions to non-final sink states. For U = (4, 9), 
and the formulas v1, y2, and 3 in final nodes, we compute 


429. f=2=§= 
Jĝ. ĉ=ĝ=yA 
Jé9.f=G=yA 


e 
x 
N 
Ill 


yAn<2=4<2 


y<2 


r<2 =HExr<2 
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so that K3 := checkPath(w2) sets K3 (b2) = V} JU. yi V,U) =£<2Vy<2. 
For reasons of space, the constructions for bı and bg are shown in [27, Appendix 
B]; we obtain A3(b,) = T and K3(b3) = x < 2. By negation, the solution K> 
to AG(a > 2) is Ko = ~K; = {bı > L, bgp z > 2Ay È> 2, ba |> a > 2}. 
Now we can proceed with checkPath(y1). The NFA and product automata for 
wi, =X Ke are as shown in Example 3 and in a similar way as above we obtain 
the solution Kı for EXAG(a > 2) as Kı = {bı BP x > 2, bo + y>2, ba > L}. 
Thus, B satisfies the property for any initial assignment a; with a;(x) > 2. 


Next we prove correctness of checkState(x) under the condition that it is defined, 
i.e., all required product automata are finite. First we state our main result, but 
before giving its proof we show helpful properties of toLDTZx and checkPath. 


Theorem 2. For every configuration (b, a) of the DDSA B and every state prop- 
erty X, if checkState(x) is defined then (b,a) = x iff a = checkState(y)(b). 


Lemma 3. Lety be a path formula with qd(w)=k. Suppose that for all confi- 
gurations (b,a) and path formulas Y’ with qd(w’) < k, there is a p' E€ FRuns(b, a) 
with p’ = Y iff a H checkPath(y’)(b). Then p = Y iff p Ex toLTLx(y). 


Proof (sketch). By induction on a. The base cases are by the definitions of K7, 
Ky, and K,. In the induction step, if y = E y’ then p = w iff dp’ € FRuns(bo, ao) 
with p = Ww’, for po=(bo,a0). As qd(w’) < qd(w), this holds by assump- 
tion iff ag = checkPath(w’)(bo). This is equivalent to p Ex toLTLxc(w) = 
checkPath(y’). All other cases are by the induction hypothesis and Definitions 7 
and 9. 


Lemma 4. If Y’ = toLTLx(w) such that for all runs p it is p Ew iff p EK WY, 
there is a run pE FRuns(b, a) with p = 4% iff a = checkPath(w)(b) 


Proof. (=>) Cupp ene there is a run p€ FRuns(b,a) with p = Y, so p is of the 
form (b,a) >* (br, ap) for some bp € Bp. By assumption, this implies p =x p, 
so that by Theorem 1, e , has a final state (br, qr, p) where ¢ is satisfied by 
an assignment y with demain V UV such that (Vo) =a(V) and 7(V) =ar(V). 
By definition, checkPath(w)(b) contains a disjunct 3U. y(V,U). As y satisfies 
y and 7(Vo)=a(V), a = checkPath(w)(b). (<=) Ifa H = checkPath(Y)(b ), by 
definition of checkPath there is a formula y such that a = JU. (V,U) and ọ 
occurs in a final state (br, qr, p) of NE, p Hence there is an assignment y with 
domain VU V and ¥(Vo) = a(V) such that y E y. By Theorem 1, there is a run 
p: (b,a) —* (br, ar) such that p Fx Y’. By the assumption, we have pew. 


At this point the main theorem can be proven: 


Proof (of Theorem 2). We first show (x): for any path formula 7, there is a 
run p E€ FRuns(b,a) with p = w iff a = checkPath(w)(b). The proof is by 
induction on gd(w). If w contains no path quantifiers, Lemma 3 implies that 
p H w iff p Ex toLTLx(w) for all runs p, so (x) follows from Lemma 4. In the 
induction step, we conclude from Lemma 3, using the induction hypothesis of 
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(x) as assumption, that p = w iff p Ex toLTLx(w) for all runs p. Again, (x) 
follows from Lemma 4. 

The theorem is then shown by induction on y: The base cases T, b €B, 
c€C are easy to check, and for properties of the form =x’ and xı A x2 the claim 
follows from the induction hypothesis and the definitions. Finally, for x = Ey, 
(b,a) H} x iff there is a run p E€ FRuns(b,a) such that p — 7. By (x) this is the 
case iff a — checkPath(w)(b) = checkState(x) (0). 


Termination. We next show that the formulas generated in our procedure all 
have a particular shape, to obtain an abstract termination result. For a set of 
formulas  C C(V) and a symbolic run ø, let a history constraint h(a, 0) be over 
basis ® if V = (Jo,...,0n) and for all i, 1<i<n, there is a subset T; C @ s.t. 
0; = NT. Moreover, for a set of formulas ®, let B= = SU {~y | y E P}. 


Definition 11. For a DDSA B, a constraint set C over free variables V, and 
k>0, the formula sets Py are inductively defined by Po =C U{T, L} and 


Pri = {V eu U. V0) | HEH} 


where Hp is the set of all history constraints of B with basis U;<;, 2; - 


Note that formulas in g have free variables V, while those in Hg have free vari- 
ables VoU V. We next show that these sets correspond to the formulas generated 
by our procedure, if all constraints in the verification property are in C. 


Lemma 5. Let Ew have quantifier depth k, Y! = toLTLx(w), and Ne, be a 
constraint graph constructed in checkPath(wW) for some b € B. Then, 


(1) for all nodes (b',q, p) in Nee there is some y' E Hy, such that p= ¢’, 
(2) checkPath()(b) is equivalent to a formula in Og +1. 


The statements are proven by induction on k, using the results on the product 
construction ([27, Lemma 6]). From part (1) of this lemma and Theorem 2 we 
thus obtain an abstract criterion for decidability that will be useful in the next 
section: 


Corollary 1. Fora DDSA B as above and a state formula x, if H;(0) is finite up 
to equivalence for all j < qd(x) and be B, a witness map can always be computed. 


Proof. By the assumption about the sets H;(b) for j7<qd(x), all product 
automata constructions in recursive calls checkPath(w) of checkState(x) termi- 
nate if logical equivalence of formulas is checked eagerly. Thus checkState(x) is 
defined, and by Theorem 2 the result is a witness map. 


The property that all sets H;(b), j < d(x), are finite might not be decidable 
itself. However, in the next section we will show means to guarantee this property. 
Moreover, we remark that finiteness of all H;(b) implies a finite history set, 
a decidability criterion identified for the linear-time case [28, Definition 3.6]; 
but Example 5 below illustrates that the requirement on the #1;(b)’s is strictly 
stronger. 
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5 Decidability of DDSA Classes 


We here illustrate restrictions on DDSAs, either on the control flow or on the 
constraint language, that render our approach a decision procedure for CTL}. 


Monotonicity constraints (MCs) restrict constraints (Definition 1) as follows: 
MCs over variables V and domain D have the form p © q where p,q E€ DUV 
and © is one of =, Æ, <, <, >, or >. The domain D may be R or Q. We call a 
boolean formula whose atoms are MCs an MC formula, a DDSA where all atoms 
in guards are MCs an MC-DDSA, and a CTL} property whose constraint atoms 
are MCs an MC property. For instance, 8 in Example 2 is an MC-DDSA. 

We exploit a useful quantifier elimination property: If y is an MC formula over 
a set of constants L and variables VU{z}, there is some y’ = Jx. y such that ¢’ is 
a quantifier-free MC formula over V and L. Such a y’ can be obtained by writing 
y in disjunctive normal form and applying a Fourier-Motzkin procedure [36, 
Sect. 5.4] to each disjunct, which guarantees that all constants in y’ also occur 


in y. 


Theorem 3. For any DDSA B and property x over monotonicity constraints, 
a witness map is computable. 


Proof. Let x be an MC property, and L the finite set of constants in constraints 
in x, Qo, and guards of B. Let moreover MCz, be the set of quantifier-free formulas 
whose atoms are MCs over V U Vo and L, so MCz is finite up to equivalence. 
We show the following property (x): all history constraints (ø, 9) over basis 
MC, are equivalent to a formula in MC;. For a symbolic run a: bo —* bn—1 & bn 
and a sequence J = (Vo,..., Vn) over MCz, the proof is by induction on n. 
In the base case, h(o,9)=y, A Vo is in MC, because y, is a conjunction 
of equalities between V U Vo, and Jp € MC, by assumption. In the induc- 
tion step, h(o, 0) = update(h(o|n—1,Y|n—1);@n) A Vn. By induction hypothesis, 
h(o|n—1; Vln-1) = Y for some y in MCz. Thus h(o, 9) = JU.~(U)AAQ(U, V) nbn. 
As B is an MC-DDSA, A,(U,V) is a conjunction of MCs over V UU and con- 
stants L, and J, € MCz by assumption. By the quantifier elimination property, 
there exists a quantifier-free MC-formula y’ over variables Vo UV that is equiva- 
lent to JU.p(U) AAa(U, V) Ap, and mentions only constants in L, so y’ € MCr. 
For C the set of constraints in x, we now show that H; C MC,, for all 
j > 0, by induction on j. In the base case (j =0), the claim follows from (x), as 
all constraints in po, i.e., in x, are in MCz. For j >0, consider first a formula 
p € Pj for some bE B. Then @ is of the form @ = Vex JU. (V,U) for 
some H CH ;_1. By the induction hypothesis, H C MCz, so by the quantifier 
elimination property of MC formulas, @ is equivalent to an MC-formula over V 
and L in MCz. As Hj s built over basis ®;, the claim follows from (x). 


Notably, the above quantifier elimination property fails for MCs over integer 
variables; indeed, CTL model checking is undecidable in this case [42, Theorem 
4.1]. 
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Integer periodicity constraint systems confine the constraint language to 
variable-to-constant comparisons and restricted forms of variable-to-variable 
comparisons, and are for instance used in calendar formalisms [18,22]. More 
precisely, integer periodicity constraint (IPC) atoms have the form z = y, Od 
for © € {=,4,<,>}, £ =k ytd, or x =» d, for variables x,y with domain 
Z and k,d € N. A boolean formula whose atoms are IPCs is an IPC formula, 
a DDSA whose guards are conjunctions of IPCs an [PC-DDSA, and a CTL} 
formula whose constraint atoms are [PCs an IPC property. For instance, Bip. in 
Example 2 is an IPC-DDSA. 

Using Corollary 1 and a known quantifier elimination property for IPCs [18, 
Theorem 2], one can show that witness maps are also computable for IPC- 
DDSAs, in a proof that resembles the one of Theorem 3 (see [27, Theorem 4]). 


Theorem 4. For any DDSA B and property x over integer periodicity con- 
straints, a witness map is computable. 


The proofs of both Theorems 3 and 4 rely on the fact that all transition guards 
and constraints in the verification property are in a finite set of constraints C 
that is closed under quantifier elimination, so that for all y € C and actions 
a, update(y, a) is again equivalent to a formula in C. However, this is not the 
only way to ensure the requirements of Corollary 1: For a simple example, these 
requirements are satisfied by a loop-free DDSA, where the number of runs is 
finite. Interestingly, while the cases of MC and IPC systems are also captured 
by the abstract decidability criterion by Gascon [31], this need not apply to loop- 
free DDSAs. A clarification of the relationship between the criteria in Corollary 1 
and [31, Thm 4.5] requires further investigation. 


Bounded lookback [28] restricts the control flow of a DDSA rather than the 
constraint language, and is a generalization of the earlier feedback-freedom prop- 
erty [14]. Intuitively, k-bounded lookback demands that the behavior of a DDSA 
at any point in time depends only on k events from the past. We refer to [28, 
Definition 5.9] for the formal definition. Systems that enjoy bounded lookback 
allow for decidable linear-time verification [28, Theorem 5.10]. However, we next 
show that this result does not extend to branching time. 


Example 5. We reduce control state reachability of two-counter machines (2CM) 
to the verification problem of CTL; formulas in bounded lookback systems, 
inspired by [42, Theorem 4.1]. 2CMs have a finite control structure and two 
counters zı, and x2 that can be incremented, decremented, and tested for 0. It 
is undecidable whether a 2CM will ever reach a designated control state f [43]. 
For a 2CM M, we build a feedback-free DDSA B= (B,b;,A,T, Br, V, az, guard) 
and a CTL} property x such that B satisfies y iff f is reachable in M. The set B 
consists of the control states of M, together with an error state e and auxiliary 
states b; for transitions t of M, and Br = {f,e}. The set V consists of x1, £2 
and auxiliary variables p1, p2, M1, mg. Zero-test transitions of M are directly 
modeled in B, whereas a step q — q' that increments x; by one is modeled as: 
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The step q — b writes x;, storing its previous value in p;, but if the write was not 
an increment by exactly 1, a step to state e is enabled. Decrements are modeled 
similarly. Intuitively, bounded lookback holds because variable dependencies are 
limited: in a run of M, a variable dependency that is not an equality extends over 
at most two time points. (More formally, non-equality paths in the computation 
graph have at most length 1.) As increments are not exact, B overapproximates 
M. However, x = EG(7E Xe) asserts existence of a path that never allows for 
a step to e (i.e., it properly simulates M) but reaches the final state f. Thus, B 
satisfies x iff f is reachable in M. 


© 


6 Implementation 


We implemented our approach in the prototype ada (arithmetic DDS analyzer) 
in Python; source code, benchmarks, and a web interface are available (https:// 
ctlstar.adatool.dev). As input, the tool takes a CTL* property x together with 
a DDSA in JSON format; alternatively, a given (bounded) Petri net with data 
(DPN) in PNML format [5] can be transformed into a DDSA. The tool then 
applies the algorithm in Fig.1. If successful, it outputs the configuration map 
returned by checkState(y), and it can visualize the product constructions. For 
SMT checks and quantifier elimination, ada interfaces CVC5 [23] and Z3 [17]. 
Besides numeric variables, ada also supports variables of type boolean and string; 
for the latter, only equality comparison is supported, so different constants can 
be represented by distinct integers. In addition to the operations in Definition 6, 
ada allows next operators (a) via an action a, which are useful for verification. 

We tested ada on a set of business process models presented as Data Petri nets 
(DPNs) in the literature. As these nets are bounded, they can be transformed 
into DDSAs. The results are reported in the table below. We indicate whether 
the system belongs to a decidable class, the verified property and whether it is 
satisfied by the initial configuration, the verification time, the number of SMT 
checks, and the number of nodes in the DDSA B and the sum of all product 
constructions, respectively. We used CVC5 as SMT solver; times are without 
visualization, which tends to be time-consuming for large graphs. All tests were 
run on an Intel Core i7 with 4x2.60 GHz and 19GB RAM. 
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process property sat| time | checks | |B] Ne al 

(a) road fines No deadlock x 7.0s 8161| 9| 2052 
AG (pr — EF end) s|  Z6s| 7655 1987 

AG (end — total < amount) X | 1m12s| 111139 3622 

(b) road fines No deadlock v |15m27s| 247563| 9] 4927 
AG (p7 — EF end) vV | 16m7s| 246813 4927 

(c) road fines No deadlock x 9s 9179| 9| 1985 
AG (p7 — EF end) s | 66s} 6382 597 

pei = EF (dS > 2160) x 11.5s| 17680 280 

we2 = EF (dP > 1440) x 10.0s| 15187 1280 

Wes = EF (dJ > 1440) x | 10.5] 16000 280 

(d) hospital billing No deadlock v |20m59s]1234928] 17| 23147 
Wai = EF (p16 A sclosed) v |10m20s| 669379 10654 

(e) sepsis No deadlock v | 1m386s 39/301] 44939 
Wer = AG (sink > ty < tap) x | 30.1s 70 22724 

We2 = AG (sink — ttr +60 > tab) v 32s 53 22538 

(f) sepsis No deadlock v 7m24 4524|301] 161242 
wri = A(alacticAcidG (diagnostic) T ) v | 3m53s 5734 74984 

(g) board: register No deadlock V 1.4s 12] 7 27 
(h) board: transfer No deadlock v 1.4s 27| 7 51 
(i) board: discharge No deadlock v 1.5s 25] 6 67 
pii = AG (p2 A 01=207 — AG 01=207) v 1.5s 94 91 

wig = A (EF (tra) T A EF (his) T) s| 15s 27 98 

Wig = -E (F (tra) T A F (his) T) <| 14s 56 43 

(j) credit approval No deadlock v 1.7s 470| 6 230 
wb31 = AG ((openLoan) T — ver A dec) v 13.2s| 14156 645 

wj2 = A (F (ver A dec) — F (openLoan)T) | X 3.7s 3128 316 

pjs = A (F (ver A dec) — EF (openLoan)T)| v 5.6s 4748 548 

(k) package handling No deadlock v 2.7ss 1025] 16 693 
No deadlock (71) v 2.58 1079 398 

Wri = EF (fetch) T x 2.68 850 343 

Yra = EF (76) T x 2.48 875 336 

(1) auction No deadlock x 10.8s 1683] 5 186 
EF (sold Ad>0A0<t) x 6.4s| 1180 79 

EF (b=1A0>tA F (sold A b> 1)) s | 26.5s| 4000 263 


We briefly comment on the benchmarks and some properties: For all examples 
we checked no deadlock, which abbreviates AG EF xş where yf is a disjunction 
of all final states. This is one of the two requirements of the crucial soundness 
property (cf. Example 1). Weak soundness [4] relaxes this requirement to demand 
only that if a transition is reachable, it does not lead to deadlocks; this is called 
here no deadlock(a), expressed by EF ((a)T) — AG ((a)T — F yf). One can also 
check whether a specific state p is deadlock-free, via AG (p — EF yf). 


(a)-(c) are versions of the road fine management process (cf. Example 1); (a) [40, 
Fig. 12.7] and (b) [37, Fig. 13] were mined automatically from logs, while (c) 
is the normative version [41, Fig. 7] shown in Example 1. While in (a) and 
(c) no deadlock is violated, this issue was fixed in version (b). The fact that 
We1, We2, and Yeg hold confirm that the time constraints are never violated. 

(d) models a billing process in a hospital [40, Fig. 15.3], which is deadlock-free. 

(e) is a normative model for a sepsis triage process in a hospital [40, Fig. 13.3], 
and (f) is a variation that was mined purely automatically from logs [40, 
Fig. 13.6]. According to [40, Sect. 13], triage should happen before antibiotics 
are administered, expressed by pei, which is actually not satisfied. However, 
the desired time constraint expressed by pez holds. 

(g)—(i) reflect activities in patient logistics of a hospital, based on logs of real- 
life processes [40, Fig. 14.3]. While the no deadlock property is satisfied by 
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all initial configurations, the output of ada reveals that for (h) this need not 
hold for other initial assignments. 

(j) is a credit approval process [16, Fig. 3]. It is deadlock-free; jı and a,» verify 
desirable conditions under which a loan is granted to a client. 

(k) is a package handling routine [26, Fig. 5]. The fact that the properties Wx1 
and wx2 are not satisfied shows that the transitions Te and fetch are dead. 

(1) models an auction process [28, Example 1.1], for which ada reveals a deadlock. 
Results for two further properties from [28, Example 1.1] are listed as well. 


Seven systems are in a decidable class wrt. the listed properties: (a), (b), (d), 
(£), (h), (i), (k) are MC, while (d), (h), (i), (k) are IPC. This is due to the fact 
that automatic mining techniques often produce monotonicity constraints [39]. 


7 Conclusion 


This paper presents a technique to compute witness maps for a given DDSA and 
CTL} property, where a witness map specifies conditions on the initial variable 
assignment such that the property holds. The addressed problem is thus a slight 
generalization of the common verification problem. While our model checking 
procedure need not terminate in general, we show that it does if an abstract 
property on history constraints holds. Moreover, witness maps always exist for 
monotonicity and integer periodicity constraint systems. However, this result 
does not extend to bounded lookback systems. We implemented our approach 
in the tool ada and showed its usefulness on a range of business process models. 

We see various opportunities to extend this work. A richer verification lan- 
guage could support past time operators [18] and future variable values [20]. 
Further decidable fragments could be sought using covers [33], or aiming for 
compatibility with locally finite theories [32]. Moreover, a restricted version of 
the bounded lookback property could guarantee decidability of CTL}, similarly 
to the way feedback freedom was strengthened in [35]. The implementation could 
be improved to avoid the computation of many similar formulas, thus gaining 
efficiency. Finally, the complexity class that our approach implies for CTL} in 
the decidable classes is yet to be clarified. 
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Abstract. We present a decision procedure for intermediate logics rely- 
ing on a modular extension of the SAT-based prover intuitR for IPL 
(Intuitionistic Propositional Logic). Given an intermediate logic L and a 
formula a, the procedure outputs either a Kripke countermodel for œ or 
the instances of the characteristic axioms of L that must be added to IPL 
in order to prove a. The procedure exploits an incremental SAT-solver; 
during the computation, new clauses are learned and added to the solver. 


1 Introduction 


Recently, Claessen and Rosén have introduced intuit [4], an efficient decision 
procedure for Intuitionistic Propositional Logic (IPL) based on the Satisfiability 
Modulo Theories (SMT) approach. The prover language consists of (flat) clauses 
of the form A A; — V Ag (with A; a set of atoms), which are fed to the SAT- 
solver, and implication clauses of the form (a — b) — c (a, b, c atoms); thus, 
we need an auxiliary clausification procedure to preprocess the input formula. 
The search is performed via a proper variant of the DPLL(T) procedure [16], 
by exploiting an incremental SAT-solver; during the computation, whenever a 
semantic conflict is thrown, a new clause is learned and added to the SAT-solver. 
As discussed in [9], there is a close connection between the intuit approach and 
the known proof-theoretic methods. Actually, the decision procedure mimics the 
standard root-first proof search strategy for a sequent calculus strongly con- 
nected with Dyckhoff’s calculus LJT [5] (alias G4ip). To improve performances, 
we have re-designed the prover by adding a restart operation, thus obtaining 
intuitR [8] (intuit with Restart). Differently from intuit, the intuitR pro- 
cedure has a simple structure, consisting of two nested loops. Given a formula 
a, if a is provable in IPL the call intuitR(a) yields a derivation of a in the 
sequent calculus introduced in [8], a plain calculus where derivations have a sin- 
gle branch. If æ is not provable in IPL, the outcome of intuitR (a) is a (typically 
small) countermodel for a, namely a Kripke model falsifying a. We stress that 
intuitR is highly performant: on the basis of a standard benchmarks suite, it 
outperforms intuit and other state-of-the-art provers (in particular, fCube [6] 
and intHistGC [12]). 
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In this paper we present intuitRIL, an extension of intuitR to Interme- 
diate Logics, namely propositional logics extending IPL and contained in CPL 
(Classical Propositional Logic). Specifically, let œ be a formula and L an axiom- 
atizable intermediate logic having Kripke semantics; the call intuitRIL(a,L) 
tries to prove the validity of a in L. To this aim, the prover searches for a set 
W containing instances of Ax(L), the characteristic axioms of L, such that a 
can be proved in IPL from W. Note that this is different from other approaches, 
where the focus is on the synthesis of specific inference rules for the logic at 
hand (see, e.g., [17]). Basically, intuitRIL(a,L) searches for a countermodel K 
for a, exploiting the search engine of intuitR: whenever we get K, we check 
whether K is a model of L. If this is the case, we conclude that a is not valid 
in L (and K is a witness to this). Otherwise, the prover selects an instance w 
of Ax(L) falsified in K (there exists at least one); ¢ is acknowledged as learned 
axiom and, after clausification, it is fed to the SAT-solver. We stress that a naive 
implementation of the procedure, where at each iteration of the main loop the 
computation restarts from scratch, would be highly inefficient: each time the 
SAT-solver should be initialized by inserting all the clauses encoding the input 
problem and all the clauses learned so far. Instead, we exploit an incremental 
SAT-solver, where clauses can be added but never deleted (hence, all the sim- 
plifications and optimisations performed by the solver are preserved); note that 
this prevents us from exploiting strategies based on standard sequent /tableaux 
calculi, where backtracking is required. 

If the call intuitRIL(a,L) succeeds, by tracking the computation we get a 
derivation D of a in the sequent calculus Cy, (see Fig. 1); from D we can extract 
all the axioms learned during the computation. We stress that the procedure is 
quite modular: to handle a logic L, one has only to implement a specific learning 
mechanism for L (namely: if K is not a model of L, pick an instance of Ax(L) 
falsified in KC). The main drawback is that there is no general way to bound the 
learned axioms, thus termination must be investigated on a case-by-case basis. 
We guarantee termination for some relevant intermediate logics, such as Gödel- 
Dummett Logic GL, the family GL, (n > 1) of Gédel-Dummett Logics with 
depth bounded by n (GL, coincides with Here and There Logic, well known 
for its applications in Answer Set Programming [15]) and Jankov Logic (for a 
presentation of such logics see [2]). As a corollary, for each of the mentioned 
logic L we get a bounding function [3], namely: given a, we compute a bounded 
set Wa of instances of Ax(Z) such that a is valid in L iff a is provable in IPL 
from assumptions Ya; in general we improve the bounds in [1,3]. The intuitRIL 
Haskell implementation and other additional material (e.g., the omitted proofs) 
can be downloaded at https://github.com/cfiorentini/intuitRIL. 


2 Basic Definitions 


Formulas, denoted by lowercase Greek letters, are built from an enumerable set of 
propositional variables V, the constant -L and the connectives A, V, —>; moreover, 
~q stands for a > L and a © @ stands for (a — 2) A (6 — a). Elements of 
the set V U {L} are called atoms and are denoted by lowercase Roman letters, 
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uppercase Greek letters denote sets of formulas. By Va we denote the set of 
propositional variables occurring in a. The notation is extended to sets: Vp is 
the union of Va such that a € I; Vr r and Vra stand for Vrur. and Vpusa} 
respectively. A substitution is a map from propositional variables to formulas. 
By [pı | a1,...,Pn +? Qn] we denote the substitution x such that x(p) = a; if 
p = pi and x(p) = p otherwise; the set {pi,...,pn} is the domain of x, denoted 
by Dom(x); € is the substitution having empty domain. The application of x to 
a formula a, denoted by x(a), is defined as usual; (I) is the set of x(a) such 
that a € I’. The composition y1- x2 is the substitution mapping p to x1(x2(p)). 

A (classical) interpretation M is a subset of V, identifying the propositional 
variables assigned to true. By M — a we mean that a is true in M; MET 
iff M — a for every a € I. Classical Propositional Logic (CPL) is the set of 
formulas true in every interpretation. We write l Fe a iff M H I implies 
M E a, for every M. Note that a is CPL-valid (namely, a € CPL) iff 0 Fe a. 

A (rooted) Kripke model is a quadruple (W, <,r, V) where W is a finite 
and non-empty set (the set of worlds), < is a reflexive and transitive binary 
relation over W, the world r (the root of K) is the minimum of W w.r.t. <, and 
V : W + 2 (the valuation function) is a map obeying the persistence condition: 
for every pair of worlds w; and w2 of K, wi < wə implies (w1) C (w2); the 
triple (W, <,r) is called (Kripke) frame. The valuation Ý is extended to a forcing 
relation between worlds and formulas as follows: 


w lF p iff p E€ Ww), Yp E V w¥ Ll wlang iff wl- a and wlIF 8 
wl-avßifwlkaorwl- 8 wlka —> ZG iff Vw’ > w,w l- a implies w’ IH 8. 


By w |I- I we mean that w IF- a for every a € I. A formula a is valid in the 
frame (W, <, r) iff for every valuation V, r I- a@ in the model (W, <, r, J). Proposi- 
tional Intuitionistic Logic (IPL) is the set of formulas valid in all frames. Accord- 
ingly, if there is a model K such that r F a (here and below r designates the root 
of K), then a is not IPL-valid; we call K a countermodel for a. We write I F; 6 
iff, for every model K, r I- I implies r I- 6; thus, a is IPL-valid iff @ Hi a. 

Let L be one of the logics IPL and CPL; then, L is closed under modus 
ponens ({a,œ — 8} C L implies 8 € L) and under substitution (for every x, 
a € L implies x(a) € L). An intermediate logic is any set of formulas L such 
that IPL C L C CPL, L is closed under modus ponens and under substitution. A 
model K is an L-model iff r I- L; if r 1K a, we say that K is an L-countermodel for 
a. An intermediate logic L can be characterized by a set of CPL-valid formulas, 
called the L-axioms and denoted by Ax(L). An L-axiom 7 of Ax(L) must be 
understood as a schematic formula, representing all the formulas of the kind 
x(q); we call y(wW) an instance of p. Formally, IPL + Ax(ZL) is the intermediate 
logic collecting the formulas a such that WY F; a, where W is a finite set of 
instances of Z-axioms from Ax(L). A bounding function for L is a map that, 
given a, yields a finite set Ya of instances of Z-axioms such that Ya Fi a. If L 
admits a computable bounding function, we can reduce L-validity to IPL-validity 
(see [3] for an in-depth discussion). Let F be a class of frames and let Log(F) 
be the set of formulas valid in all frames of F; then, Log(F) is an intermediate 
logic. A logic L has Kripke semantics iff there exists a class of frames F such 
that L = Log(F); we also say that L is characterized by F. Henceforth, when we 


60 C. Fiorentini and M. Ferrari 


mention a logic L, we leave understood that L is an axiomatizable intermediate 
logic having Kripke semantics. 


Example 1 (GL). A well-known intermediate logic is Goddel-Dummett logic 
GL [2], characterized by the class of linear frames. An axiomatization of GL 
is obtained by adding the linearity axiom lin = (a — b) V (b > a) to IPL. Using 
the terminology of [3], GL is formula-axiomatizable: a bounding function for GL 
is obtained by mapping a to the set Wa of instances of lin where a and b are 
replaced with subformulas of a. In [1] it is proved that it is sufficient to consider 
the subformulas of a of the kind p € Va, 78, 8, — b2. In Lemma 4 we further 
improve this bound tacking as bounding function the following map: 


Axci(a) ={(a— b) vV (ba) | a,be Va} U {(a— 7a) V (Fa > a) | aE Vo } 
U { (a > (a > b)) V ((a => b) > a)) | a,b E Va} 


Thus, if Va = {a}, the only instance of lin to consider is (a + ~a) V (~a > a), 
independently of the size of a (the other instances are IPL-valid and can be 
omitted). As pointed out in [3], GL is not variable-axiomatizable, namely: it is 
not sufficient to consider instances of lin obtained by replacing a and b with 
variables from Va. As an example, let œa = ~a V 77a; & is GL-valid, the only 


variable-replacement instance of lin is Ya = (a > a) V (a > a) and Ya Ki a. Q 
We review the main concepts about the clausification procedure described 


in [4]. Clauses y and implication clauses À are defined as 


E ar ite V 42 0c Ak C VU{1}, fork € {1,2} 
= (a > b) > acy, {b,c} C VU{L} 


where /\ A; and V Az denote the conjunction and the disjunction of the atoms 
in A, and Ag respectively (A{a} = V{a} = a). Henceforth, A 0 — V Az must 
be read as V An; R, Ri, ... denote sets of clauses, X, X4, ...sets of implication 
clauses. Given a set of implication clauses X, the closure of X, denoted by (X)*, 
is the set of clauses b > c such that (a > b) > cE X. 

The following lemma states some properties of clauses and closures. 


Lemma 1. (i) R Fi g iff R Fe g, for every set of clauses R and every atom g. 
(ii) X Fi b> c, for every b> cE (X)*. 
(iti) T Fi a iffa g, I ti g, where g € Vra. 


Clausification. We assume a procedure Clausify that, given a formula a, com- 
putes sets of clauses R and X equivalent to a w.r.t. IPL. Formally, let œ be a 
formula and let V be a set of propositional variables such that Va C V. The 
procedure Clausify(a,V) computes a triple (R, X, x) satisfying: 


(C1) T,a h; 6 iff T, R,X Hi ô, for every I’ and 6 such that Vrs CV. 
(C2) Dom(x) = Vr,x \ V and Vyp) C V for every p € Dom(x). 
(C3) R,X Fi po x(p) for every p € Dom(x). 
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A=(a>b)>cEX 


Rte R, Ate b R, 9, X > 
A cPlo £ 2 cpl, (A) AC Vr,x,9 
R,X >g R,X >g 
y = NA\ {a} >c 
R, (X`, X >g 9 Ž Va 
>a Clauso(g, x) (R, X,x) = Clausify(a © g, Va,g) 
R, Ri, (X')*,X,X' >g Y € Ax(L, Vr,x,9) 
Claus: (4, x) page ah 
R,X > g (R x 1X) = Clausify(y, VR,X,9) 
R is a set of clauses (0, [g++ a]-x) if p = Clauso(g, x) 
X is a set of implication clauses m(p) = S ({wh, x) if p = Claus: (4%, x) 
g is an atom (Ø, €) otherwise 


Fig. 1. The sequent calculus Cz. 


Basically, clausification introduces new propositional variables to represent sub- 
formulas of a; as a result we obtain a substitution x which tracks the mapping 
on the new variables. Condition (C1) states that a can be replaced by RU X in 
IPL reasoning. By (C2) the domain of x consists of the new variables introduced 
in the clausification process. The following properties easily follow by (C1)—(C3): 


(P1) R,X Fi a. (P2) R,X F; 8 x(8) for every formula £. 


We exploit a Clausify procedure essentially similar to the one described 
in [4], with slight modifications in order to match (C3). As discussed in [4], in IPL 
we can use a weaker condition (either R, X Fi p —> x(p) or R,X Fi x(p) >p 
according to the case). It is not obvious whether the weaker condition should be 
more efficient; in many cases strong equivalences are more performant, maybe 
because they trigger more simplifications in the SAT-solver. 


Example 2. Let a = (a — b) V (b — a) and V = {a,b}. The call Clausify(a,V) 
introduces the new variables po and pı associated with the subformulas a — b 
and b — a respectively. Accordingly, the obtained sets R and X must satisfy 
R,X Fi Po @ (a > b) and R, X Fi pi > (b — a). We get: 


R = {Po V hi, Po ^a —> b, Dpi Ab a} x = [Po > a > b, ñ — b — a] 
X ={ (a — b) — Po, (b > a) > pi } 


© 


3 The Calculus Cz 


Let L be an intermediate logic; we introduce the sequent calculus Cz to prove 
L-validity. We assume that L is axiomatized by a set Ax(L) of L-axioms; by 


62 C. Fiorentini and M. Ferrari 


Ry-1 Fe g 


pn = Cpl 
Rani; Xni >g n P'o 
Tes asg O Vi € {1,...;n — 1}, pi = cpl, or pi = Clausı 
: n(D) = (WoU-+-U Mn, Xot- Xn) 
Ru Xi>9 a where (W4, X4) = T (pj) 
Ro, Xo >g 
= po = Clauso 


Fig. 2. A Cy-derivation of > a. 


Ax(L,V) we denote the set of instances w of Z-axioms such that Vy C V. The 
calculus relies on a clausification procedure Clausify satisfying conditions (C1)— 
(C3) and acts on sequents I’ = 6 such that: 


— either I = 9 or F = RU X and (X)* C R and ô is an atom. 


Rules of Cz are displayed in Fig.1. Rule cpl, (initial rule) can only be applied 
if the condition R Fe g holds; if this is the case, the conclusion R, X = g is an 
initial sequent, namely a top sequent of a derivation. The other rules depend on 
parameters that are made explicit in the rule name. A bottom-up application of 
cpl, requires the choice of an implication clause À = (a — b) — c from X, we 
call the main formula, and the selection of a set of atoms A C Vr,x,g such that 
R,A Fe b, where b is the middle variable in À. As discussed in [8,9], cpl, is a 
sort of generalization of the rule L —— of the sequent calculus LJT/G4ip for 
IPL [5,18]. Rules Clausy and Claus; exploit the clausification procedure. Rule 
Clauso requires the clausification of the formula a = g, with g a new atom 
(g Z Va); in rule Claus}, the clausified formula ~ is selected from Ax(L, Vr,x,q)- 
In both cases, the clauses returned by Clausify are stored in the premise of 
the applied rule and the computed substitution x is displayed in the rule name; 
moreover, Clauso is annotated with the new atom g and Claus, with the chosen 
L-axiom wy. To recover the relevant information associated with the application 
of a rule p, in Fig. 1 we define the pair 7(p) = (W, X), where W is a set of instances 
of [-axioms and y is a substitution. Cz-trees and C-derivations are defined as 
usual (see e.g. [18]); a sequent ø is provable in Cy, iff there exists a C,-derivation 
having root sequent ø. Let us consider a Cz-derivation D of => a (see Fig. 2). 
Reading the derivation bottom-up, the first applied rule is Clausg. After such 
an application, the obtained sequents have the form o, = Rk, Xk = g, where 
Rk U Xx is non-empty, thus rule Clausp cannot be applied any more; the rule 
applied at the top is cpl). Note that D contains a unique branch, consisting of 
the sequents > @, 00,..-,@n—1. In Fig. 2 we also define the pair t(D) = (W, x): 
W collects the (instances of) L-axioms selected by rule Claus,, x is obtained by 
composing the substitutions associated with the applied rules. The definition of 
m(T), with T a Cz-tree, is similar. By T(a;R, X = g) we denote a C,z-tree 
having root = a and leaf R,X = g. Given a Cz-tree T, Vr is the set of 
variables occurring in 7. We state some properties about C-trees: 
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Lemma 2. Let T = T (a; R,X > g) and let r(T) = (W, x). 


(i) Vy(p) E Va, for every p E€ Vr. 


(ti) R,X Fi BO x(8), for every formula B. 
(ii) If R,X,T Fi; g and Vp C Va, then T, x(W) Fi a. 


Proposition 1. Let D be a Cr-derivation of = a and let n(D) = (W, x). Then, 
Var) E Va and x(W) Fi a. 


Rte g 
ee opl 
Proof. Since D is a Cy-derivation, D has the form R,X >g P'o 
depicted on the right where T = T(a;R,X > g); D= . 
note that 7(T) = (D) = (W,x). Since R Fe g, by a 


Lemma 1(i) we get R Fi g, hence R,X +i g. We 
can apply Lemma 2 and claim that Vy) E Va and 
x(W) Fi a. 


Given a Cz-derivation D of = a, Prop. 1 exhibits how to extract a set 
of instances W, of the L-axioms such that Wa +; a. If D does not contain 
applications of rule Claus, Y is empty, and this ascertains that a is IPL-valid; 
actually, D can be immediately embedded into the calculus for IPL introduced 
in [8]. As an immediate consequence of Prop. 1, we get the soundness of Cz: if 
= a is provable in Cz, then a is L-valid. 

Even though Cz-derivations have a simple structure, the design of a root- 
first proof search strategy for Cy is far from being trivial. After having applied 
rule Clausg to the root sequent = a, we enter a loop where at each iteration 
k we search for a derivation of ok = Rk, Xk = g. It is convenient to firstly 
check whether Rk Fe g so that, by applying rule cpl), we immediately close the 
derivation at hand. To check classical provability, we exploit a SAT-solver; each 
time the solver is invoked, the set Rp has increased, thus it is advantageous to use 
an incremental SAT-solver. If Ry -. g, we have to apply either rule cpl, or rule 
Claus,, but it is not obvious which strategy should be followed. First, we have to 
select one between the two rules. If rule cpl, is chosen, we have to guess proper À 
and A; otherwise, we have to apply Claus,, and this requires the selection of an 
instance w of an L-axiom. In any case, if we followed a blind choice, the procedure 
would be highly inefficient. To guide proof search, we follow a different approach 
based on countermodel construction; to this aim, we introduce a representation 
of Kripke models where worlds are classical interpretations ordered by inclusion. 


Countermodels. Let W be a finite set of interpretations with minimum Mo, 
namely: Mp C M for every M € W. By K(W) we denote the Kripke model 
(W, <, Mo, V) where < coincides with the subset relation C and V is the identity 
map, thus M IF- p (in K(W)) iff p € M. We introduce the following realizability 
relation >w between elements of W and implication clauses: 


M pw (a > b) > c iff (a € M) or (bE M) or (c € M) or 
(3M'e€W s.t. M C M' andae M’ andb¢M’). 
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By M òw X we mean that M py A for every A € X. We state the crucial 
properties of the model K(W): 


Proposition 2. Let K(W) be the model generated by W and let w E W. Let p 
be a clause and A = (a > b) > c an implication clause. 


(i) If w = ọ, for every w E€ W such that w < w', then w IF y. 
(ii) If w Fb c and w' >w A, for every w E€ W such that w < w', then w IF À. 


Let K(W) be a model with root r, and assume that every interpretation w in 
W is a model of R; our goal is to get r IF RUX (where (X)* C R), possibly by 
filing W with new worlds. To this aim, we exploit Prop. 2. By our assumption 
and point (i), we claim that r I- R. Suppose that there is w € W and à = 
(a = b) = c € X such that wy); is it possible to amend K(W) in order to 
match (ii) and conclude r I- X? By definition of >w, none of the atoms a, b, c 
belongs to w; moreover K(W) lacks a world w’ such that w C w’ and a € w’ and 
b Z w’. We can try to fix K(W) by inserting the missing world w’; to preserve (i), 
we also need w’ = R. Accordingly, such a w’ exists if and only if R,w,a KF. b. 
This can be checked by querying a SAT-solver; moreover, if R,w,a e b, the 
solver also computes the required w’. This completion process must be iterated 
until K(W) has been saturated with all the missing worlds or we get stuck. It 
is easy to check that the process eventually terminates. This is one of the key 
ideas beyond the procedure intuitRIL we present in next section. 


4 The Procedure intuitRIL 


We present the procedure intuitRIL (intuit with Restart for Intermediate 
Logics) that, given a formula a and a logic L = IPL + Ax(Z), returns either a 
set of L-axioms Ya or a model K(W) with the following properties: 


(Q1) If intuitRIL(a,L) returns Ya, then Ya C Ax(L, Va) and Ya Fi a. 
(Q2) If intuitRIL(a,L) returns K(W), then K(W) is an L-countermodel for a. 


Thus, a is L-valid in the former case, not L-valid in the latter. If intuitRIL(a,L) 
returns Ya, by tracing the computation we can build a Cy-derivation D of > a 
such that Ya = x(W), where (W, x) = 1(D); this certificates that Ya Fi a. 

The procedure is described by the flowchart in Fig.3 and exploits a single 
incremental SAT-solver s: clauses can be added to s but not removed; by R(s) 
we denote the set of clauses stored in s. The SAT-solver is required to support 
the following operations: 


— newSolver(R) creates a new SAT-solver initialized with the clauses in R. 
— addClauses(s, R) adds the clauses in R to the SAT-solver s. 
— satProve(s, A, g) calls s to decide whether R(s),A He g (A is a set of 
propositional variables). The solver outputs one of the following answers: 
e Yes(A’): thus, A’ C A and R(s), A’ Fe g; 
e No(M): thus, AC M C Vrs) UA and M H R(s) and g g M. 
In the former case it follows that R(s), A Fe g, in the latter R(s), A Ke g. 
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g < new atom (g ¢ Va) 


(R',X', x) + Clausify(a + g, Vag) (S0) 
s + newSolver(R’ U (X’)*) 


Xe xve dx [g= a] x’, V & Vr, xX’ ag 


Yes(Q) 


eg = NA\f{a}) >c 


addClauses(s, {p} restart 
(S8) 


semantic 


(57) No(M) restart 


W + Wu{M} re M,W={r} 


Yes(A) 


select (w, A} s-t. 
weEW, AEX, wh yA 


satProve(s, wU {a}, b) 


(R’,X’,x’) + Clausify(w,V) 
addClauses(s, R’ U(X’)*) 

X — XUX', Y & WU{y} 
Ve VUV, x’, X — xx’ 


select Y s.t. 
y € Ax(L, V) and r ¥ y 


No such wy 
(54) 


Fig. 3. Computation of intuitRIL(a, L). 


The computation of intuitRIL(a,L) consists of the following steps: 


(SO) The formula a + g, with g new propositional variable, is clausified. The 
outcome (R’, X’, x’) is used to create a new SAT-solver s and to prop- 
erly initialize the global variables X (set of implication clauses), Y (set of 
L-axiom instances), V (set of propositional variables) and x (substitution). 

(S1) A loop starts (main loop). The SAT-solver s is called to check whether 

R(s) He g. If the answer is Yes(@), the computation stops yielding 

x(W). Otherwise, the output is No( M) and the computation continues at 

Step (S2). 

(S2) We set r = M (the root of K(W)) and W = {r}. 

(S3) A loop starts (inner loop). We have to select a pair (w, A) such that w € W, 
A E€ X and wh yA. If such a pair does not exist, the inner loop ends and 
next step is (S4), otherwise the inner loop continues at Step (S6). 

(S4) As we show in Lemma 3, at this point K(W) is a countermodel for a. If 
all the axioms in Ax(L, V) are forced at the root r of K(W), then K(W) 
is an L-countermodel for a and the computation ends returning K(W). 
Otherwise, we select w from Ax(L, V) such that r /K w and the computation 
continues at Step (S5); we call y the learned axiom. 
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(S5) We clausify Y and we update the global variables. The computation restarts 
from Step (S1) with a new iteration of the main loop (semantic restart). 

(S6) Let (w, (a — b) — c) be the pair selected at Step (S3). The SAT-solver s is 
called to check whether R(s),w,a Fe b. If the result is No( M), the inner 
loop continues at step (S7). Otherwise, the answer is Yes(A); the inner 
loop ends and the computation continues at Step (S8). 

(S7) The interpretation M is added to W and the computation continues at 
Step (S3) with a new iteration of the inner loop. 

(S8) The clause y (learned basic clause) is added to the SAT-solver s and the 
computation restarts from Step (S1) (basic restart). 


Intuitively, intuitRIL(a,L) searches for an L-countermodel K(W) for a. In the 
construction of K(W), whenever a conflict arises, a restart operation is triggered. 
A basic restart happens when it is not possible to fill the set W with a missing 
world (see the discussion after Prop. 2). A semantic restart is thrown when 
K(W) is a countermodel for a but it fails to be an L-model. In either case, the 
construction of K(W) restarts from scratch. However, to prevent that the same 
kind of conflict shows up again, new clauses are learned and fed to the SAT-solver 
(this complies with DPLL(T) with learning computation paradigm [16]). If the 
outcome is x(W), by tracing the computation we can build a C,-derivation D 
of = a such that 7(D) = (W,x). The derivation is built bottom-up. The initial 
Step (SO) corresponds to the application of rule Clausp to the root sequent = a; 
basic and semantic restarts bottom-up expand the derivation by applying rule 
cpl, and Claus, respectively. We stress that the procedure is quite modular; to 
treat a specific logic L one has only to provide a concrete implementation of 
Step (S4). For L = IPL, Step (S4) is trivial, since the set Ax(IPL, V) is empty. 
Actually, intuitRIL applied to IPL has the same behaviour as the procedure 
intuitR introduced in [8]. 


Example 3. Let us consider Jankov axiom wem = ~a V ~~a [2,13] (aka weak 
excluded middle), which holds in all frames having a single maximal world (thus, 
wem is GL-valid). The trace of the execution of intuitRIL(wem,GL) is shown 
in Fig. 4. The initial clausification yields (Rg, Xo, g), where Xo consists of the 
implication clauses Xo, A1 in Fig. 4 and Ro contains the 7 clauses below: 


g > P2, Po —> P2, a^pPo—> L, Pi > P2, poApirt, peg, P2 —> PoV pr. 


Each row in Fig. 4 displays the validity tests performed by the SAT-solver 
and the computed answers. If the result is No(M), the last two columns show 
the worlds wg in the current set W and, for each wg, the list of A such that 
wey; the pair selected for the next step is underlined. For instance, after 
call (1) we have W = {wo}, wo% w Ao and wo% w A1; the selected pair is (wo, Ao). 
After call (2), the set W is updated by adding the world w1; we have w1 >w Ao, 
wi >w A1, Wo Pw Ao and wofyA1. Whenever the SAT-solver outputs Yes(A), 
we display the learned clause Yk. The SAT-solver is invoked 18 times and there 
are 6 restarts (1 semantic, 5 basic). After (3), we get W = {wo, w1, w2} and no 
pair (w, A) can be selected, hence the model K(W) (displayed in the figure) is 
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a countermodel for wem. However, K(W) is not a GL-model (indeed, it is not 
linear), hence we choose an instance of the linearity axiom not forced at wo, 
namely wo, and we force a semantic restart. The clausification of Yo produces 6 
new clauses and the new implication clauses A2, A3, A4. After each restart, the 
sets Rj are: 


Ri = Ro U { ps > fa, a > Ps, Ps A Ds —> a, aA pa > ps, aA p3 > L, pa V ps } 
Ri = Rj-1U {y;-1} for 2< j < 6 (the ws are defined in Fig. 4). 


The Cez-derivation of = ~a V ~~a extracted from the computation is: 


Re Fe g g 
Rs,a,pa Fe L Reo, Xi => 9 


l (à 
R4, Do, ps Fel Rs, Xı >ğĵ p O ) cp. if 1) 
R3,a, p3 Fe L Ra, Xi >G ER Ply (Ao 
R2,a, Po Fe L Rs, Xı>ğ T Ply (A1 
Rı,a, Po Fe L Rọ, X> 9 Ply (Ao 
z cpl, (A3) 
fi, X> g 
—— ~ Claus) (wo, x1) 
PaO Cisne cg 
saa Vang Clause (9; xo) 
Q 


Now, we discuss partial correctness and termination of intuitRIL. Let us 
denote with ~e classical equivalence (a ~e 8 iff Fe a = 8) and with ~i 
intuitionistic equivalence (a ~; 8 iff Fi œa a 8). We introduce some notation. 


(t) The following terms refer to the configuration at the beginning of iteration 
k (k > 0), just after the execution of Step (S2): 
— Bp is the set collecting all the learned basic clauses; 
— R, is the set of clauses stored in the SAT-solver s; 
— Xk, Ye, Vk, Xk, TR are the values of the corresponding global variables. 


In Fig. 5 we inductively define the Cz-tree Tp, having the form T (a; Rk, Xk => g9). 
In the application of rule Clauso, g and x’ are defined as in Step (SO). In rule 
cpl,, A is the implication clause selected at iteration k — 1 (of the main loop) 
in the last execution of Step (S3); A is the value computed at Step (S6) of 
iteration k — 1. In the application of rule Claus, y and x’ are defined as in the 
execution of Step (S4) and (S5) of iteration k — 1. One can easily check that the 
applications of the rules are sound. If Step (S1) yields Yes(@), we can turn JT; 
into a C,-derivation by applying rule cplo. 

Next lemma states some relevant properties of the computations of 
intuitRIL. 


Lemma 3. Let us consider the execution of iteration k of the main loop (k > 0). 


(i) (Xx)* UP, C Re. 
(ii) Vk = Vr, and Wk C Ax(L, Vp) and n(Tk) = (Wk, Xk). 
(ttt) Vya(py E Va, for every p E€ Vk, and Rk, Xp Fi B œ xkl), for every B. 
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ào = (po > L) > fi At = (a > L) > po 


A2 = (a > ps) > pa A3 = (a > L) —> ps ra = (P3 > a) > Ps 

wọo=0 w = {ğ, Po, po} we = {a, ĝ, pi, po} wa = {84} wa = {G, Po, Pe, pa} 
wg = {9, Po, p2, D3, pa} we = {a, ps} wr = {ps, pa} Ws = {9, Po, p2, D3, pa} 
wo = {ps} wio= {pa} wii = {9, Po, P2, ps, pa} 

xo = |G ~a V ~~a, pots ~a, Pı +> 77a, p24 7a V aa] 


Xı [p3 => a, Pam a a, Ps œ> ~a —> al 
@SAT Answer Ww A s.t. wh yA 
Start (1) Ro Fe g? No(wo) wo Ao, Ai 
(2) Ro, wo, Po Fe L? No(w1) wi ø 
wo AL 
(3) Ro, wo, a Fe L? No(w2) w2 f) 
WI ) 
wo ) 
Semantic eg, EES a J, Pir P2 Learned axiom: 
failure 7 wo = (a 1a) V (~a > a) 
STE () 
SRest 1 (4) Ri Fe g? No(ws) w3 Ao, A1, À3, Aa 
(5) Ri, w3, po Fe L? No(wa) w4 A3, Aa 
w3 Ai, A3, A4 
(6) Ras wa, p3 Fe a ? No(ws) W5 i) 
W4 A3 
w3 Ai, À3 
(7) Ri, w4, a Fe L? Yes({a, po}) [Vi = Po > P3 
BRest 2 (8) Ro te g? No(we) we Ao 
(9) R2, we, Po Fe L? Yes({a, po}) [Y2 =a —> pr 
BRest 3 (10) Ra Fe g? No(w7) w7 Ao, À1 
11) R3, W7, Po roe & ? No(ws) Ws ff) 
wr AL 
12) R3, w7,ate L? Yes({a, p3 }) [Vs = p3 > po 
BRest 4 13 R4 Fe g 2 No(wo) wg Ao; Ai, ra, AZ 
14) Ra, wo, Po Fe L? Yes( { Po, Ps }) [ya = Ps > Di 
BRest 5 15 Rs Fe g ? No(wio) wio Ào, A1, A3, Aa 
16) Rs, wio, Bo Fe 1? No(wi1) W11 ø 
wio A1,A3 
17 Rs, w10, a Fe L? Yes( { a, pa}) Ys = pa —> Po 
BRest 6 (18) Re te g? Yes( 0) Proved 


Fig. 4. Computation of intuitRIL(~a V ~na, GL). 
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Ro, Xo > 
To = — Clauso(g, x’) 
Rrp-1,A Fe b Rk, Xk >g Rk Xk >g 
cpl, (A i - 5 ‘ 
a eT pl, (A) Tea eSa Claus: (4%, x’) 
Te = Tk = 
Tea > Ret 
>a >a 
if k > 0 and iteration k — 1 ends with if k > 0 and iteration k — 1 ends with 
a basic restart (thus X; = X,%~1) a semantic restart 


Fig. 5. Definition of Tp (k > 0). 


(iv) At every step after (S2), w = Ry, for every w E€ W. 
(v) At every step after (S2), rę is the root of K(W) and rp I- Ry and rp W g. 
(vi) At Step (S4), rp I- Rk U Xk U Wp and rz ¥ g (in K(W)). 
(vii) Assume that iteration k ends with a basic restart and let p be the learned 
basic clause. For every yp! E€ Bk, P Le vy. 
(viii) Assume that iteration k ends with a semantic restart and let y be the 
learned axiom. For every Y’ € Wk, Xk) £i Xk’). 


Proof. We only sketch the proof of the non-trivial points. 

(iii). By Lemma 2 applied to Tp. 

(v). Every interpretation M generated at Step (S6) is a superset of rg, thus 
after Step (S2) rp is the minimum element of W and the root of K(W). By (iv) 
and Prop. 2(i), rz I- Ry. Since g Z rpg, we get rp IF g. 

(vi). At Step (S4), w >w A for every w € W and A € Xx. Since (X})* C Rk, 
by Prop. 2(ii) we get rz, I- Xp. Let Y E€ Wp; then, w has been learned at some 
iteration k’ < k. Let (R’, X’,y’) be the output of Clausify(w,V) at Step (S5) 
of iteration k’ . Since R’ C Ry and X’ C Xx, it holds that rz I- R'U X’. By (P1) 
R', X’ Fi wy, hence rz IF w, which proves rz IF Wp. 

(vii). Let p’ € Pk; we show that y £e y’. Let y = N(A \ {a}) > c; then, 
there are w € W and \ = (a > b) —> c € Xp such that (w, A) has been selected 
at Step (S3) and the outcome of satProve(s,w U {a},b) at Step (S6) is Yes(A). 
Note that wž wA, hence c ¢ w; since A C wU {a}, we get w  ». On the other 
hand, w = y’, since y’ € & and Pk C Rp. We conclude y e yy’. 

(viii). Let Yy’ € Wp and let K(W) be the model obtained at Step (54) of 
iteration k. By (iii) Rk, Xę Fi Y œ xk(Y) and Rk, Xp Fi Y > xk(Y'). Since 
Tk w Y and Tk IH yy! (indeed, yy! € Wk and Tk IF Wy) and Tk IF Rk U Xk, we get 
rp! xk(Y) and rg IF xk(Y'). We conclude yz (w) £i xk). 


The following proposition proves the partial correctness of intuitRIL: 


Proposition 3. intuitRIL(a,L) satisfies properties (Q1) and (Q2). 


Proof. Let us assume that the computation ends at iteration k with output 
Wa. Then, the call to the SAT-solver at Step (SO) yields Yes(@), meaning that 
Rr Fe g. We can build the following Cz-derivation D of > a: 
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: Th 
>a 


Note that Wa = x,(W%). Accordingly, by Prop. 1 we get (Q1). 
Let us assume that the output is the model K(W), having root r. Then, K(W) 
is an L-model (otherwise, Step (S4) should have forced a semantic restart). By 
Lemma 3(vi) we get r I- RoU Xo and r IK g. Since at Step (S0) we have clausified 
the formula a +> g, by (P1) we get Ro, Xo Fi a < g, which implies r lk a © g. 
We conclude that r lf a, hence (Q2) holds. 


It seems challenging to provide a general proof of termination, and each logic 
must be treated apart. We can only state some general properties about the 
termination of the inner loop and of consecutive basic restarts. 


Proposition 4. (i) The inner loop is terminating. 
(ii) The number of consecutive basic restarts is finite. 


Proof. Let us assume, by absurd, that the inner loop is not terminating. For 
every j > 0, by W; we denote the value of W at Step (S3) of iteration j of 
the inner loop; note that the value of the variable V does not change during the 
iterations. We show that W; C W;4+1, for every j > 0. At iteration j, the outcome 
of Step (S6) is No( M). Thus, there are w € W; and \ = (a > b) —> c € X such 
that the pair (w, A) has been selected at Step (S3); accordingly, wy, and 
wU {a} C M and b ¢ M. We have M ¢ Wj, otherwise we would get w bw, A, a 
contradiction. Since W;+1 = W; U {M}, this proves that W; C W;+1. We have 
shown that Wọ C W, C W2.... This leads to a contradiction since, for every 
j = 0 and every w € Wj, w is a subset of V and V is finite. We conclude that 
the inner loop is terminating, and this proves (i). 

Let us assume, by contradiction, that there is an infinite sequence of consec- 
utive basic restarts. Then, there is n > 0 such that, for every k > n, the iteration 
k of the main loop ends with a basic restart. Let y, be the clause learned at 
iteration k. Note that an iteration ending with a basic restart does not introduce 
new atoms, thus Vy, C Vn for every k > n (where Vp is defined as in (7)). We 
get a contradiction, since V, is finite and, by Lemma 3(vi), the clauses yp, are 
pairwise non ~,-equivalent; this proves (ii). 


Lemma 3(vii) guarantees that the learned axioms are pairwise distinct, but this 
is not sufficient to prove termination since in general we cannot set a bound on 
the size and on the number of learned axioms. In next section we present some 
relevant logics where the procedure is terminating. 
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5 Termination 


Let GL = IPL + lin be the Gédel-Dummett logic presented in Ex. 1; we show 
that every call intuitRIL(a,GL) is terminating. To this aim, we exploit the 
bounding function Axa (a) presented in the mentioned example. 


Lemma 4. Let us consider the computation of intuitRIL(a,GL) and assume 
that at iteration k of the main loop Step (S4) is executed and that the obtained 
model K(W) is not linear. Then, there exists y E€ Axala) such that rp  w. 


Proof. Let us assume that K(W) has two distinct maximal worlds w and wy; 
note that wı C Vk and wo C Vp (with Vp defined as in ({)). We show that: 


(a) wi N Vo Æ w2 N Va. 


Suppose by contradiction wı N Va = w2 N Va; let p E Ve and 8 = yz(p) (with 
Xk defined as in (f)). By Lemma 3(iii), Rk, Xx Fi p < 8; by Lemma 3(vi) 
we get wı Ik p e 6 and w2 IF p © p. Since Vg C Va (see Lemma 3(iii)) and 
we are assuming w, N Va = w2 N Va, it holds that wı IF 8 iff wə IF 8, thus 
wy, I- p iff we IF p, namely p € wy iff p € we. Since p is any element of Vk, we 
get wı = w2, a contradiction; this proves (a). By (a) there is a € Va such that 
either a € w, \ w2 or a € wg \ w1. We consider the former case (the latter one 
is symmetric), corresponding to Case 1 in Fig. 6. We have wı I- a and we IF ~a; 
setting w = (a —> ~a) V (~a = a), we conclude rp  w. 

Assume that K(W) has only one maximal world; since it is not linear, there 
are three distinct worlds w1, w2, w3 as in Case 2 in Fig. 6, namely: wı is an 
immediate successor of wz and ws (i.e., for j € {2,3}, wj < wı and, if w; < w, 
then wı < w), wo É w3, w3 £ We. Reasoning as in (a), we get: 


(b) w2 N Va £ w3 N Va. (c) w2 N Va C w1 N Va and w3 N Va C WN Va. 


By (b) there is a € Va such that either a € wg \ w3 or a € wz \ wg. Let us 
consider the former case (the latter one is symmetric). By (c), there is b € Va 
such that b € wı \ we. If b € ws (Case 2.1 in Fig. 6), we get a € wo, b Z wa, 
a € w3, b E w3. Setting Y = (a — b) V (b — a), we conclude rp ¥ w. Finally, 
let us assume b ¢ wz (Case 2.2). We have {a,b} C wi, a E€ wo, b ¢ wo, a € w3 
and b ¢ w3. It is easy to check that w3 IF a — b (recall that w3 < w implies 
w < w), thus ws ¥ (a > b) > a. On the other hand w2 K a — (a —> b). Setting 
y = (a —> (a > b)) V ((a > b) > a), we get rk ¥ Y. 


We exploit Lemma 4 to implement Step (S4). If K(W) is linear, then K(W) is a 
GL-model and we are done. Otherwise, the proof of Lemma 4 hints an effective 
method to select an instance 4% of lin from Axe (a). 


Proposition 5. The computation of intuitRIL(a,GL) is terminating. 


Proof. Assume that intuitRIL(a,GL) is not terminating. Since the number of 
iterations of the inner loop and of the consecutive basic restarts is finite (see 
Prop. 4), Step (S4) must be executed infinitely many times. This leads to a 
contradiction, since the axioms selected at Step (S4) are pairwise distinct (see 
Lemma 3(vii)) and such axioms are chosen from the finite set Axcr (a). 
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Fig. 6. Proof of Lemma 4, case analysis. 


As a corollary, we get that Axcı (a) is a bounding function for GL: 
Proposition 6. Ifa is GL-valid, there is Va C Axa (a) such that Va Fi a. 


Other proof-search strategies for GL are discussed in [10,14]. This technique 
can be extended to other notable intermediate logics. Among these, we recall 
the logics GL, (Gödel Logic of depth n), obtained by adding to GL the axioms 
bdn (bounded depth) where: bdo = ao V 7a9, bdn+1 = an+1 V (an+ı > bdn). 
Semantically, GL, is the logic characterized by linear frames having depth at 
most n. We are not able to prove termination for the logics IPL + bdn, but we 
can implement the following terminating strategy for GL,,. Let K(W) be the 
model obtained at Step (54) of the computation of intuitRIL(a,GL,): 


— If K(W) is not linear, we select the axiom w from Axgr(qa). 

— Otherwise, assume that K(W) is linear but not a GL,-model. Then, K(W) 
contains a chain of worlds wo C wy, C --: C Wn+1. The crucial point is 
that w;+1 \ w; contains at least a propositional variable from Va, for every 
0 <j <n. Thus, we can choose a proper renaming of bd,, as w. 


Another terminating logic is the Jankov Logic (see Ex. 3); actually, also in this 
case the learned axiom can be chosen by renaming the wem axiom. In general, 
all the logics BTW, (Bounded Top Width, at most n maximal worlds, see [2]) 
are terminating. An intriguing case is Scott Logic ST [2]: even though the class 
of ST-frames is not first-order definable, we can implement a learning procedure 
for ST-axioms arguing as in [7] (see Sec. 2.5.2). Some of the mentioned logics 
have been implemented in intuitRIL!. 

One may wonder whether this method can be applied to other non-classical 
logics or to fragments of predicate logics (these issues have been already raised 
in the seminal paper [4]). A significant work in this direction is [11], where the 
procedure has been applied to some modal logics. However, the main difference 
with the original approach is that it is not possible to use a single SAT-solver, 
but one needs a supply of SAT-solvers. This is primarily due to the fact that 
forcing relation of modal Kripke models is not persistent; thus worlds are loosely 
related and must be handled by independent solvers. 


' Available at https://github.com/cfiorentini/intuitRIL. 
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Abstract. The study of clause redundancy in Boolean satisfiability 
(SAT) has proven significant in various terms, from fundamental insights 
into preprocessing and inprocessing to the development of practical proof 
checkers and new types of strong proof systems. We study liftings of 
the recently-proposed notion of propagation redundancy—based on a 
semantic implication relationship between formulas—in the context of 
maximum satisfiability (MaxSAT), where of interest are reasoning tech- 
niques that preserve optimal cost (in contrast to preserving satisfiability 
in the realm of SAT). We establish that the strongest MaxSAT-lifting of 
propagation redundancy allows for changing in a controlled way the set 
of minimal correction sets in MaxSAT. This ability is key in succinctly 
expressing MaxSAT reasoning techniques and allows for obtaining cor- 
rectness proofs in a uniform way for MaxSAT reasoning techniques very 
generally. Bridging theory to practice, we also provide a new MaxSAT 
preprocessor incorporating such extended techniques, and show through 
experiments its wide applicability in improving the performance of mod- 
ern MaxSAT solvers. 


Keywords: Maximum satisfiability - Clause redundancy - 
Propagation redundancy - Preprocessing 


1 Introduction 


Building heavily on the success of Boolean satisfiability (SAT) solving [13], max- 
imum satisfiability (MaxSAT) as the optimization extension of SAT constitutes 
a viable approach to solving real-world NP-hard optimization problems [6,35]. 
In the context of SAT, the study of fundamental aspects of clause redundancy 
(20,21, 23, 28, 29,31,32] has proven central for developing novel types of prepro- 
cessing and inprocessing-style solving techniques [24,29] as well as in enabling 
efficient proof checkers [7, 15,16, 18,19,41,42] via succinct representation of most 
practical SAT solving techniques. Furthermore, clause redundancy notions have 
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been shown to give rise to very powerful proof systems, going far beyond res- 
olution [22,23,30]. In contrast to viewing clause redundancy through the lens 
of logical entailment, the redundancy criteria developed in this line of work are 
based on a semantic implication relationship between formulas, making them 
desirably efficient to decide and at the same time are guaranteed to merely pre- 
serve satisfiability rather than logical equivalence. 

The focus of this work is the study of clause redundancy in the context of 
MaxSAT through lifting recently-proposed variants of the notion of propagation 
redundancy [23] based on a semantic implication relationship between formulas 
from the realm of SAT. The study of such liftings is motivated from several per- 
spectives. Firstly, earlier it has been shown that a natural MaxSAT-lifting called 
SRAT [10] of the redundancy notion of the notion of resolution asymmetric tau- 
tologies (RAT) [29] allows for establishing the general correctness of MaxSAT- 
liftings of typical preprocessing techniques in SAT solving [14], alleviating the 
need for correctness proofs for individual preprocessing techniques [8]. However, 
the need for preserving the optimal cost in MaxSAT—as a natural counterpart 
for preserving satisfiability in SAT—allows for developing MaxSAT-centric pre- 
processing and solving techniques which cannot be expressed through SRAT 
2,11]. Capturing more generally such cost-aware techniques requires developing 
more expressive notions of clause redundancy. Secondly, due to the fundamental 
connections between solutions and so-called minimal corrections sets (MCSes) of 
MaxSAT instances [8,25], analyzing the effect of clauses that are redundant in 
terms of expressive notions of redundancy on the MCSes of MaxSAT instances 
can provide further understanding on the relationship between the different 
notions and their fundamental impact on the solutions of MaxSAT instances. 
Furthermore, in analogy with SAT, more expressive redundancy notions may 
prove fruitful for developing further practical preprocessing and solving tech- 
niques for MaxSAT. 

Our main contributions are the following. We propose natural liftings of the 
three recently-proposed variants PR, LPR and SPR of propagation redundancy 
in the context of SAT to MaxSAT. We provide a complete characterization of the 
relative expressiveness of the lifted notions CPR, CLPR and CSPR (C standing 
for cost for short) and of their impact on the set of MCSes in MaxSAT instances. 
In particular, while removing or adding clauses redundant in terms of CSPR and 
CLPR (the latter shown to be equivalent with SRAT) do not influence the set 
of MCSes underlying MaxSAT instances, CPR can in fact have an influence on 
MCSes. In terms of solutions, this result implies that CSPR or CLPR clauses 
can not remove minimal (in terms of sum-of-weights of falsified soft clauses) 
solutions of MaxSAT instances, while CPR clauses can. 

The—theoretically greater—effect that CPR clauses have on the solutions 
of MaxSAT instances is key for succinctly expressing further MaxSAT reason- 
ing techniques via CPR and allows for obtaining correctness proofs in a uniform 
way for MaxSAT reasoning techniques very generally; we give concrete examples 
of how CPR captures techniques not in the reach of SRAT. Bridging to prac- 
tical preprocessing in MaxSAT, we also provide a new MaxSAT preprocessor 
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extended with such techniques. Finally, we provide large-scale empirical evidence 
on the positive impact of the preprocessor on the runtimes of various modern 
MaxSAT solvers, covering both complete and incomplete approaches, suggesting 
that extensive preprocessing going beyond the scope of SRAT appears beneficial 
to integrate for speeding up modern MaxSAT solvers. 

An extended version of this paper, with formal proofs missing from this 
version, is available via the authors’ homepages. 


2 Preliminaries 


SAT. For a Boolean variable x there are two literals, the positive x and the 
negative 72, with ==] = l for a literal |. A clause C is a set (disjunction) of 
literals and a CNF formula F' a set (conjunction) of clauses. We assume that all 
clauses are non-tautological, i.e., do not contain both a literal and its negation. 
The set var(C) = {x | x£ € Cor 7a € C} consists of the variables of the 
literals in C. The set of variables and literals, respectively, of a formula are 
var(F) = Uceprvar(C) and 1it(F) = UcerC, respectively. For a set L of 
literals, the set ~L = {~l | l € L} consists of the negations of the literals in L. 
A (truth) assignment T is a set of literals for which xz ¢ T or ax ¢ 7 for any 
variable x. For a literal l we denote l € T by t(l) = 1 and =l € T by 7(1) = 0 or 
7(-l) = 1 as convenient, and say that 7 assigns l the value 1 and 0, respectively. 
The set var(T) = {a | £ E T or ~z € T} is the range of 7, i.e., it consists of the 
variables 7 assigns a value for. For a set L of literals and an assignment 7, the 
assignment Tp = (T \ 7L) UL is obtained from 7 by setting 7, (1) = 1 for all 
l€ L and 7,(l) = r(l) for all l ¢ L assigned by r. For a literal l, 7) stands for 
Tg- An assignment 7 satisfies a clause C (7(C) = 1) if 7A C # O or equivalently 
if r(1) = 1 for some l € C, and a CNF formula F (T(F) = 1) if it satisfies each 
clause C € F. A CNF formula is satisfiable if there is an assignment that satisfies 
it, and otherwise unsatisfiable. The empty formula T is satisfied by any truth 
assignment and the empty clause L is unsatisfiable. The Boolean satisfiability 
problem (SAT) asks to decide whether a given CNF formula F is satisfiable. 
Given two CNF formulas F, and Fo, Fı entails Fy (Fi H Fə) if any 
assignment 7 that satisfies F and only assigns variables of F} (i.e. for which 
var(T) C var(F)) can be extended into an assignment T? D T that satisfies F2. 
The formulas are equisatisfiable if F is satisfiable iff Fy is. An assignment r is 
complete for a CNF formula F if var(F) C var(r), and otherwise partial for F. 
The restriction F le of F wrt a partial assignment 7 is a CNF formula obtained 
by (i) removing from F all clauses that are satisfied by 7 and (ii) removing from 
the remaining clauses of F literals l for which 7(1) = 0. Applying unit propaga- 
tion on F refers to iteratively restricting F by 7 = {1} for a unit clause (clause 
with a single literal) (1) € F until the resulting (unique) formula, denoted by 
UP(F), contains no unit clauses or some clause in F becomes empty. We say that 
unit propagation on F derives a conflict if UP(F’) contains the empty clause. The 
formula F implies F> under unit propagation (F Fı F2) if, for each C € F>, 
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unit propagation derives a conflict in Fi A {(-l) | 1 € C}. Note that Fy Fy F> 
implies F, = F>, but not vice versa in general. 


Maximum Satisfiability. An instance F = (Fy, Fs, w) of (weighted partial) 
maximum satisfiability (MaxSAT for short) consists of two CNF formulas, the 
hard clauses Fy and the soft clauses Fs, and a weight function w: Fs — N that 
assigns a positive weight to each soft clause. 

Without loss of generality, we assume that every soft clause C € Fg is unit!. 
The set of blocking literals B(F) = {1 | (~l) € Fg} consists of the literals 
l the negation of which occurs in Fg. The weight function w is extended to 
blocking literals by w(l) = w((-l)). Without loss of generality, we also assume 
that | € lit(Fp) for all l € B(F)?. Instead of using the definition of MaxSAT 
in terms of hard and soft clauses, we will from now on view a MaxSAT instance 
F = (Fy, B(F),w) as a set Fy of hard clauses, a set B(F) of blocking literals 
and a weight function w: B(F) — N. 

Any complete assignment 7 over var(Fy) that satisfies Fy is a solution to 
F. The cost COST(F,7) = Vieg x) T()w(Y) of a solution 7 is the sum of weights 
of blocking literals it assigns to 1°. The cost of a complete assignment 7 that 
does not satisfy Fy is defined as oo. The cost of a partial assignment T over 
var(F 7) is defined as the cost of smallest-cost assignments that are extensions 
of T. A solution 7° is optimal if COST(F,7°) < COST(F,7) holds for all solutions 
T of F. The cost of the optimal solutions of a MaxSAT instance is denoted by 
COST(F), with COST(F) = oo iff Fy is unsatisfiable. In MaxSAT the task is to 
find an optimal solution to a given MaxSAT instance. 


Example 1. Let F = (Fy, B(F),w) be a MaxSAT instance with Fy = {(x V 
bi), (na Vv bə), (y Vv bs Vv ba), (z Vv =y Vv ba), (=z) }, B(F) = {b1 , b2, b3, ba} hav- 
ing w(bı) = w(b4) = 1, w(b2) = 2 and w(b3) = 8. The assignment T = 
{b1, ba, =b2, 703, =z, =z, y} is an example of an optimal solution of F and has 
COST(F, T) = COST(F) = 2. 


With a slight abuse of notation, we denote by F AC = (Fy U {0}, B(F A 
C),w) the MaxSAT instance obtained by adding a clause C to an instance 
F = (Fy,B(F),w). Adding clauses may introduce new blocking literals but 
not change the weights of already existing ones, i.e., B(F) C B(F ^ C) and 
w? (1) = w7 (1) for all | € B(F). 


Correction Sets. For a MaxSAT instance F, a subset cs C B(F) is a minimal 
correction set (MCS) of F if (i) Fy A Nieg(F)\cs(™!) is satisfiable and (ii) Fy A 
N\ice(F)\cs, (7!) is unsatisfiable for every css Ç cs. In words, cs is an MCS if it 


= 


1 A soft clause C can be replaced by the hard clause C V x and soft clause (=x), where 
x is a variable not in var(Fx A Fs), without affecting the costs of solutions. 

? Otherwise the instance can be simplified by unit propagating =l without changing 
the costs of solutions. As a consequence, any complete assignment for Fy will be 
complete for FH A Fs as well. 

3 This is equivalent to the sum of weights of soft clauses not satisfied by 7. 
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is a subset-minimal set of blocking literals that is included in some solution 7 of 
F“ We denote the set of MCSes of F by mcs(F). 

There is a tight connection between the MCSes and solutions of MaxSAT 
instances. Given an optimal solution T° of a MaxSAT instance F, the set T° N 
B(F) is an MCS of F. In the other direction, for any cs € mcs(F), there is a (not 
necessary optimal) solution 7°* such that cs = B(F) M7 and COST(F, T) = 


Diew w(l). 


Example 2. Consider the instance F from Example 1. The set {b1, b4} € mcs(F) 
is an MCS of F that corresponds to the optimal solution 7 described in 
Example 1. The set {b2,b3} € mcs(F) is another example of an MCS that 
instead corresponds to the solution Tz = {b2, b3, =b1, =b4, £, mz, ~y} for which 
COST(F,7) = 10. 


3 Propagation Redundancy in MaxSAT 


We extend recent work [23] on characterizing redundant clauses using semantic 
implication in the context of SAT to MaxSAT. In particular, we provide natural 
counterparts for several recently-proposed strong notions of redundancy in SAT 
to the context of MaxSAT and analyze the relationships between them. 

In the context of SAT, the most general notion of clause redundancy is seem- 
ingly simple: a clause C is redundant for a formula F if it does not affect its 
satisfiability, i.e., clause C is redundant wrt a CNF formula F if F and F A {C} 
are equisatisfiable [20,29]. This allows for the set of satisfying assignments to 
change, and does not require preserving logical equivalence; we are only inter- 
ested in satisfiability. 

A natural counterpart for this general view in MaxSAT is that the cost of 
optimal solutions (rather than the set of optimal solutions) should be preserved. 


Definition 1. A clause C is redundant wrt a MaxSAT instance F if COST(F) = 
COST(F A C). 


This coincides with the counterpart in SAT whenever B(F) = f, since then 
the cost of a MaxSAT instance F is either 0 (if Fy is satisfiable) or oo (if Fy 
is unsatisfiable). Unless explicitly specified, we will use the term “redundant” to 
refer to Definition 1. 

Following [23], we say that a clause C blocks the assignment =C (and all 
assignments T for which =C C T). As shown in the context of SAT [23], a clause 
C is redundant (in the equisatisfiability sense) for a CNF formula F if C does not 
block all of its satisfying assignments. The counterpart that arises in the context 
of MaxSAT from Definition 1 is that the cost of at least one of the solutions not 
blocked by C’ is no greater than the cost of aC. 


Proposition 1. A clause C is redundant wrt a MaxSAT instance F if and 
only if there is an assignment T for which COST(F A C,T) = COST(F,T) < 
COST(F,=C). 


4 This is equivalent to a subset-minimal set of soft clauses falsified by 7. 
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The equality COST(F A C,T) = COST(F,7) of Proposition 1 is necessary, as 
witnessed by the following example. 


Example 8. Consider the MaxSAT instance F detailed in Example 1, the clause 
C = (bs) with b5 € B(F AC) and the assignment 7 = {bs}. Then 2 = 
COST(F, T) < COST(F, =C) = 2 but C is not redundant since COST(F A C) = 
2+ wi? (bs) > 2 = COST(F). 


Proposition 1 provides a sufficient condition for a clause C being redundant. 
Further requirements on the assignment 7 can be imposed without loss of gen- 
erality. 


Theorem 1. A non-empty clause C is redundant wrt a MaxzSAT instance F = 
(Fu, B(F),w) if and only if there is an assignment T such that 

(i) r(C) = 1, (ii) Fu|_c = Ful, and 

(iii) COST(F A C,T) = COST(F, T) < COST(F, =C). 


As we will see later, a reason for including two additional conditions in The- 
orem 1 is to allow defining different restrictions of redundancy notions, some of 
which allow for efficiently identifying redundant clauses. 


Example 4. Consider the instance F = (Fy, B(F), w) detailed in Example 1, a 
clause C = (~z V bs) for a bs € B(F AC) and an assignment 7 = {72, bı}. Then: 
T(C) = 1, {(b2), (y V bs V ba), (2 V ~y V ba), (32z)} = Fu] o E Fal, = {(y V bs V 
ba), (zV~yVb4), (~z) }, and 2 = COST(FAC, T) = COST(F, T) < COST(F, =C) = 3. 
We conclude that C is redundant. 


In the context of SAT, imposing restrictions on the entailment operator and 
the set of assignments has been shown to give rise to several interesting redun- 
dancy notions which hold promise of practical applicability. These include three 
variants (LPR, SPR, and PR) of so-called (literal/set) propagation redundancy 
[23]. For completeness we restate the definitions of these three notions. A clause C 
is LPR wrt a CNF formula F if there is a literal 1 € C for which F| o Fi Flo, 
SPR if the same holds for a subset L C C, and PR if there exists an assignment 
T that satisfies C and for which Flo Fy Fis With the help of Theorem 1, we 
obtain counterparts for these notions in the context of MaxSAT. 


Definition 2. With respect to an instance F = (Fy, B(F),w), a clause C is 


- cost literal propagation redundant (CLPR) (on l) there is a literall € C 
for which either (i) L € UP(Fy|_o) or (ii) L € B(F AC) and Fylg Fy 
Falcon 

- cost set propagation redundant (CSPR) (on L) if there is a set L C 
C\ B(F AC) of literals for which Filia Fy Falco’ and 

- cost propagation redundant (CPR) if there is an assignment T such that 
(i) r(C) = 1, (ti) Fale Fy Fal. and 
(iti) COST(F A C,T) = COST(F, T) < COST(F, ~C). 
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Example 5. Consider again F = (Fy, B(F), w) from Example 1. The clause D = 
(bı V b2) is CLPR wrt F since L € UP(Fy|_p) as {(x), (7x)} C Fu|_p. As for 
the redundant clause C and assignment 7 detailed in Example 3, we have that 
C is CPR, since Fy|_ C Fu|_. which implies Fy|_o +1 Fal,- 


We begin the analysis of the relationship between these redundancy notions 
by showing that CSPR (and by extension CLPR) clauses also satisfy the 
MaxSAT-centric condition (iii) of Theorem1. Assume that C is CSPR wrt a 
instance F = (Fy, B(F),w) on the set L. 


Lemma 1. Let 7 > -C be a solution of F. Then, COST(F,7) > COST(F, Tz). 


The following corollary of Lemma 1 establishes that CSPR and CLPR clauses 
are redundant according to Definition 1. 


Corollary 1. COST(F A C,(7=C),) = COST(F, (~C) z) < COST(F, AC). 


The fact that CPR clauses are redundant follows trivially from the fact that 
Fit\_o Fy Ful, implies Fylg i Ful However, given a solution w that does 
not satisfy a CPR clause C, the next example demonstrates that the assignment 
w, need not have a cost lower than w. Stated in another way, the example 
demonstrates that an observation similar to Lemmal does not hold for CPR 
clauses in general. 


Example 6. Consider a MaxSAT instance F = (Fy, B(F),w) having Fy = 
{(x V bı), (~x, b2)}, B(F) = {b1, b2} and w(b1) = w(b2) = 1. The clause C = 
(x) is CPR wrt F, the assignment T = {x, b2} satisfies the three conditions of 
Definition 2. Now 6 = {72,b,} is a solution of F that does not satisfy C for 
which 6, = {x, b1, b2} and 1 = COST(F, 6) < 2 = COST(F, 6,). 


Similarly as in the context of SAT, verifying that a clause is CSPR (and 
by extension CLPR) can be done efficiently. However, in contrast to SAT, we 
conjecture that verifying that a clause is CPR. can not in the general case be 
done efficiently, even if the assignment T is given. While we will not go into 
detail on the complexity of identifying CPR. clauses, the following proposition 
gives some support for our conjecture. 


Proposition 2. Let F be an instance and k € N. There is another instance 
FY , a clause C, and an assignment T such that C is CPR wrt F™ if and only 
if COST(F) > k. 


As deciding if COST(F) > k is NP-complete in the general case, Proposition 2 
suggests that it may not be possible to decide in polynomial time if an assignment 
T satisfies the three conditions of Definition 2 unless P=NP. This is in contrast to 
SAT, where verifying propagation redundancy can be done in polynomial time 
if the assignment 7 is given, but is NP-complete if not [24]. 

The following observations establish a more precise relationship between the 
redundancy notions. For the following, let RED(F) denote the set of clauses that 
are redundant wrt a MaxSAT instance F according to Definition 1. Analogously, 
the sets CPR(F), CSPR(F) and CLPR(F) consist of the clauses that are CPR, 
CSPR and CLPR wrt F, respectively. 
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Observation 1 CLPR(F) C CSPR(F) C CPR(F) C RED(F) holds for any 
MazxSAT instance F. 


Observation 2 There are MaxSAT instances F1, Fa and Fz for which 
CLPR(F1) € CSPR(F1), CSPR(F2) G CPR(F2) and CPR(F3) G RED(Fs3). 


The proofs of Observations 1 and 2 follow directly from known results in 
the context of SAT [23] by noting that any CNF formula can be viewed as an 
instance of MaxSAT without blocking literals. 

For a MaxSAT-centric observation on the relationship between the redun- 
dancy notions, we note that the concept of redundancy and CPR coincide for 
any MaxSAT instance that has solutions. 


Observation 3 CPR(F) = RED(F) holds for any MaxSAT instance F with 
COST(F) < o0. 


We note that a result similar to Observation 3 could be formulated in the context 
of SAT. The SAT-counterpart would state that the concept of redundancy (in the 
equisatisfiability sense) coincides with the concept of propagation redundancy 
for SAT solving (defined e.g. in [23]) for satisfiable CNF formulas. However, 
assuming that a CNF formula is satisfiable is very restrictive in the context 
of SAT. In contrast, it is natural to assume that a MaxSAT instance admits 
solutions. 

We end this section with a simple observation: adding a redundant clause C 
to a MaxSAT instance F preserves not only optimal cost, but optimal solutions 
of F A C are also optimal solutions of F. However, the converse need not hold; 
an instance F might have optimal solutions that do not satisfy C. 


Example 7. Consider an instance F = (Fy, B(F), w) with Fy = {(b1 V be)}, 
B(F) = {bi,b2} and w(b1) = w(b2) = 1. The clause C = (70;) is CPR wrt 
F. In order to see this, let r = {~b1, b2}. Then 7 satisfies C (condition (i) of 
Definition 2). Furthermore, 7 satisfies Fy, implying F Hl ig es H|. (condition 
(ii)). Finally, we have that 1 = COST(F, T) = COST(F A C,T) < COST(F, =C) = 1 
(condition (iii)). The assignment 6 = {b1, —b2} is an example of an optimal 
solution of F that is not a solution of FAC. 


4 Propagation Redundancy and MCSes 


In this section, we analyze the effect of adding redundant clauses on the MCSes 
of MaxSAT instances. As the main result, we show that adding CSPR (and by 
extension CLPR) clauses to a MaxSAT instance F preserves all MCSes while 
adding CPR clauses does not in general. Stated in terms of solutions, this means 
that adding CSPR clauses to F preserves not only all optimal solutions, but 
all solutions 7 for which (T N B(F)) € mcs(F), while adding CPR clauses only 
preserves at least one optimal solution. 


Effect of CLPR Clauses on MCSes. MaxSAT-liftings of four specific SAT 
solving techniques (including bounded variable elimination and self-subsuming 
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resolution) were earlier proposed in [8]. Notably, the correctness of the lift- 
ings was shown individually for each of the techniques by arguing individu- 
ally that applying one of the liftings does not change the set of MCSes of any 
MaxSAT instance. Towards a more generic understanding of optimal cost pre- 
serving MaxSAT preprocessing, in [10] the notion of solution resolution asym- 
metric tautologies (SRAT) was proposed as a MaxSAT-lifting of the concept of 
resolution asymmetric tautologies (RAT). In short, a clause C is a SRAT clause 
for a MaxSAT instance F = (Fy, B(F), w) if there is a literal 1 € C \ B(F AC) 
such that Fy F1 ((C v D) \ {71}) for every D € Fy for which ~l € D. 

In analogy with RAT [29], SRAT was shown in [10] to allow for a general 
proof of correctness for natural MaxSAT-liftings of a wide range of SAT prepro- 
cessing techniques, covering among other the four techniques for which individual 
correctness proofs were provided in [8]. The generality follows essentially from 
the fact that the addition and removal of SRAT clauses preserves MCSes. The 
same observations apply to CLPR, as CLPR and SRAT are equivalent. 


Proposition 3. A clause C is CLPR wrt F iff it is SRAT wrt F. 


The proof of Proposition 3 follows directly from corresponding results in the 
context of SAT [23]. Informally speaking, a clause C is SRAT on a literal 1 iff it 
is RAT [29] on l and l ¢ B(F). Similarly, a clause C is CLPR on a literal | iff it is 
LPR as defined in [23] on l and l ¢ B(F). Proposition 3 together with previous 
results from [10] implies that the MCSes of MaxSAT instances are preserved 
under removing and adding CLPR clauses. 


Corollary 2. If C is CLPR wrt F, then mcs(F) =mcs(F AC). 


Effect of CPR Clauses on MCSes. We turn our attention to the effect of 
CPR clauses on the MCSes of MaxSAT instances. Our analysis makes use of 
the previously-proposed MaxSAT-centric preprocessing rule known as subsumed 
label elimination (SLE) [11,33]°. 


Definition 3. (Subsumed Label Elimination [11,33]) Consider a MaxSAT 
instance F = (Fuy,B(F),w) and a blocking literal | € B(F) for which -l ¢ 
lit(Fy). Assume that there is another blocking literal l € B(F) for which 
(1) als ¢ lit(Fx), (2) {C € Fy | le C} c {C € Fy | ls E€ C} and (3) 
w(l) > w(ls). The subsumed label elimination (SLE) rule allows adding (~l) to 
Ffa 


A specific proof of correctness of SLE was given in [11]. The following proposition 
provides an alternative proof based on CPR. 


Proposition 4 (Proof of correctness for SLE). Let F be a MaxSAT 
instance and assume that the blocking literals l,l, € B(F) satisfy the three con- 
ditions of Definition 3. Then, the clause C = (~l) is CPR wrt F. 


5 Rephrased here using our notation. 
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Proof. We show that T = {-l,1,} satisfies the three conditions of Definition 2. 
First 7 satisfies C (condition (i)). Conditions (1) and (2) of Definition3 imply 
Fuy|_ C Fu|_¢ which in turn implies Fy|_o Hı Fx|, (condition (ii). 

As for condition (iii), the requirement COST(F A C,T) = COST(F, T) follows 
from B(F A C) = B(F). Let 6 > AC be a complete assignment of Fy for which 
COST(F, ô) = COST(F, AC). If COST(F, 5) = oo then COST(F,r) < COST(F, =C) 
follows trivially. Otherwise ô \ ~C satisfies F H| c 80 by F H| E F F H|, it satis- 
fies Fy |, as well. Thus 6” = ((\\ >O) | 1 € r})Ur = (8L =l, =l })U{H1, ls} 
is an extension of 7 that satisfies Fy and for which COST(F,T) < COST(F, 8?) < 
COST(F, ô) by condition (3) of Definition 3. Thereby 7 satisfies the conditions of 
Definition 2 so C is CPR wrt F. 


Example 8. The blocking literals 63,b4 E€ B(F) of the instance F detailed in 
Example 1 satisfy the conditions of Definition 3. By Proposition 4 the clause (—bs3) 
is CPR wrt F. 


In [11] it was shown that SLE does not preserve MCSes in general. By Corol- 
lary 2, this implies that SLE can not be viewed as the addition of CLPR clauses. 
Furthermore, by Proposition 4 we obtain the following. 


Corollary 3. There is a MaxSAT instance F and a clause C that is CPR wrt 
F for which mcs(F) 4 mcs(F AC). 


Effect of CSPR Clauses on MCSes. Having established that CLPR clauses 
preserve MCSes while CPR clauses do not, we complete the analysis by demon- 
strating that CSPR clauses preserve MCSes. 


Theorem 2. Let F be a MaxSAT instance and C a CSPR clause of F. Then 
mcs(F) =mcs(F AC). 


Theorem 2 follows from the following lemmas and propositions. In the fol- 
lowing, let C be a clause that is CSPR wrt a MaxSAT instance F on a set 
LCC\BIFAC). 


Lemma 2. Let cs C B(F). If Fu ^ Niewe)\cs(7!) is satisfiable, then 
(Fi AC) A Nieweacy\cs(7) is satisfiable. 


Lemma 2 helps in establishing one direction of Theorem 2. 
Proposition 5. mcs(F) C mcs(F AC). 


Proof. Let cs € mes(F). Then Fy ^ \jegcr)\cs(7!) is satisfiable, which by 
Lemma 2 implies that (Fx A C) A A\jeg(¢ac)\cs(7!) is satisfiable. 
To show that (Fg AC) A \jcg(¢ac)\cs, (7!) is unsatisfiable for any css G cs C 


B(F), we note that any assignment satisfying (FAC) AAjeg(¢ac)\cs, (7!) would 
also satisfy Fy ^ Nieg(F) cs, (l), contradicting cs € mes(F). 


The following lemma is useful for showing inclusion in the other direction. 
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Lemma 3. Let cs € mcs(F¥ ^ C). Then cs C B(F). 
Lemma 3 allows for completing the proof of Theorem 2. 
Proposition 6. mcs(¥ A C) C mcs(F). 


Proof. Let cs € mcs(F A C), which by Lemma3 implies cs C B(F). Let 7 be 
a solution that satisfies (Fx ^ C) A Ajes(¢acy\cs(7!)- Then r satisfies Fy ^ 
Niew(F)\cs(!)- For contradiction, assume that Fiz ^AN reg(F) cs, (7!) is satisfiable 
for some cs, Ç cs. Then by Lemma 2, (Fy AC) A Aiea ¢ac)\cs, (7) is satisfiable 


as well, contradicting cs € mcs(F A C). Thereby cs € mcs(F). 


Theorem 2 implies that SLE can not be viewed as the addition of CSPR 
clauses. In light of this, an interesting remark is that—in contrast to CPR clauses 
in general (recall Example6)—the assignment 7 used in the proof of Proposi- 
tion 4 can be used to convert any assignment that does not satisfy the CPR 
clause detailed in Definition 3 into one that does, without increasing its cost. 


Observation 4 Let F be a MaxSAT instance and assume that the blocking lit- 
erals l,l; € B(F) satisfy the three conditions of Definition 3. Let T = {~l, ls} 
and consider any solution 6 D =C of F that does not satisfy the CPR clause 
C = (~l). Then 6, is a solution of F ^C for which COST(F,6,) < COST(F, ô). 


5 CPR-Based Preprocessing for MaxSAT 


Mapping the theoretical observations into practical preprocessing, in this section 
we discuss through examples how CPR clauses can be used as a unified theoret- 
ical basis for capturing a wide variety of known MaxSAT reasoning rules, and 
how they could potentially help in the development of novel MaxSAT reasoning 
techniques. 

Our first example is the so-called hardening rule [2,8,17,26]. In terms of our 
notation, given a solution T to a MaxSAT instance F = (Fy,B(F),w) and a 
blocking literal | € B(F) for which w(l) > COST(F,7), the hardening rule allows 
adding the clause C = (~l) to Fy. 

The correctness of the hardening rule can be established with CPR clauses. 
More specifically, as COST(F, T) < w(l) it follows that 7(C’) = 1 (condition (i) 
of Definition 2). Since 7 satisfies F, we have that Ful, =T so Fale Fy Ful, 
(condition (ii)). Finally, as COST(F, ô) > w(l) > COST(F,7) holds for all 6 > aC 
it follows that COST(F, =C) > COST(F, T) = COST(F A C,T). As such, (~l) is 
CPR clause wrt F. If fact, instead of assuming w(l) > COST(F,7) it suffices to 
assume w(l) > COST(F, rT) and 7(l) = 0. 

The hardening rule can not be viewed as the addition of CSPR or CLPR 
clauses because it does not in general preserve MCSes. 


Example 9. Consider the MaxSAT instance F from Example1 and a solution 
T = {b1, b2, ba, ab3, =z, x, Y}. Since COST(F, 7) = 3 < 8 = w(b3), the clause (bs) 
is CPR. However, mcs(¥) # mcs(F AC) since the set {b2,b3} E€ mcs(F) is not 
an MCS of FAC as (Fy AC) A Neble!) = (Fy A (=b3)) A (b1) A (>01) 
is not satisfiable. 
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Viewing the hardening rule through the lens of CPR clauses demonstrates 
novel aspects of the MaxSAT-liftings of propagation redundancy. In particular, 
instantiated in the context of SAT, an argument similar to the one we made for 
hardening shows that given a CNF formula F, an assignment 7 satisfying F, and 
a literal | for which 7(l) = 0, the clause (~l) is redundant (wrt equisatisfiability). 
While formally correct, such a rule is not very useful for SAT solving. In contrast, 
in the context of MaxSAT the hardening rule is employed in various modern 
MaxSAT solvers and leads to non-trivial performance-improvements [4,5]. 

As another example of capturing MaxSAT-centric reasoning with CPR, con- 
sider the so-called TrimMaxSAT rule [39]. Given a MaxSAT instance F = 
(Fu, B(F),w) and a literal | € B(F) for which 7(l) = 1 for all solutions of 
F, the TrimMaxSAT rule allows adding the clause C = (l) to Fy. In this case 
the assumptions imply that all solutions of F also satisfy C, i.e., that F H| c ÍS 
unsatisfiable. As such, any assignment r7 that satisfies C and Fy will also satisfy 
the three conditions of Definition 2 which demonstrates that C is CPR. It is, 
however, not CSPR since the only literal in C is blocking. 

As a third example of capturing (new) reasoning techniques with CPR, con- 
sider an extension of the central variable elimination rule that allows (to some 
extent) for eliminating blocking literals. 


Definition 4. Consider a MarSAT instance F and a blocking literal l € B(F). 
Let BBVE(F) be the instance obtained by (i) adding the clause CV D to F for 
every pair (C V 1), (D V ~l) € Fy and (ii) removing all clauses (D V ~l) € Fr. 
Then COST(F) = COST(BBVE(F)) and mcs(F) = mcs(BBVE(F)). 


On the Limitations of CPR. Finally, we note that while CPR clauses sig- 
nificantly generalize existing theory on reasoning and preprocessing rules for 
MaxSAT, there are known reasoning techniques that can not (at least straight- 
forwardly) be viewed through the lens of propagation redundancy. For a concrete 
example, consider the so-called intrinsic atmost1 technique [26]. 


Definition 5. Consider a MaxS'AT instance F and a set L C B(F) of blocking 
literals. Assume that (i) |r N{7l | 1 E€ L}| < 1 holds for any solution T of F and 
(ü) w(l) = 1 for each l € L. Now form the instance AT-MOST-ONE(F, L) by 
(i) removing each literal l € L from B(F), and (ii) adding the clause {(-l) | l € 
LD} U {lp} to F, where lz is a fresh blocking literal with w(lz) = 1. 


It has been established that any optimal solution of AT-MOST-ONE(F, L) 
is an optimal solution of F [26]. However, as the next example demonstrates, 
the preservation of optimal solutions is in general not due to the clauses added 
being redundant, as applying the technique can affect optimal cost. 


Example 10. Consider the MaxSAT instance F = (Fy, B(F),w) with Fy = 
(L) |i =1...n}, B(F) = {h ...ln} and w(l) = 1 for all l € B(F). Then |r N 
-B(F)| = 0 < 1 holds for all solutions 7 of F so the intrinsic-at-most-one tech- 
nique can be used to obtain the instance F? = AT-MOST-ONE(F, B(F)) = 
(Fh, B(F?), w?) with F} = Fg U {0h V... V aln V lL)}, Be) = {lL} and 
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w?(lL) = 1. Now ô = {l | l € B(F)} U {lz} is an optimal solution to both F? 
and F for which 1 = COST(F?,5) < COST(F, 6) = n. 


Example 10 implies that the intrinsic atmost1 technique can not be viewed as 
the addition or removal of redundant clauses. Generalizing CPR to cover weight 
changes could lead to further insights especially due to potential connections 
with core-guided MaxSAT solving [1,36-38]. 


6 MaxPre 2: More General Preprocesssing in Practice 


Connecting to practice, we extended the MaxSAT preprocessor MaxPre [33] 
version 1 with support for techniques captured by propagation redundancy. The 
resulting MaxPre version 2, as outlined in the following, hence includes tech- 
niques which have previously only been implemented in specific solver imple- 
mentations rather than in general-purpose MaxSAT preprocessors. 

First, let us mention that the earlier MaxPre [33] version 1 assumes that 
any blocking literals only appear in a single polarity among the hard clauses. 
Removing this assumption—supported by theory developed in Sects. 3-4— 
decreases the number of auxiliary variables that need to be introduced when 
a MaxSAT instance is rewritten to only include unit soft clauses. For exam- 
ple, consider a MaxSAT instance F with Fy = {(72% V y),(-y V x)} and 
Fs = {(x),(-y)}. For preprocessing the instance, MaxPre 1 extends both soft 
clauses with a new, auxiliary variable and runs preprocessing on the instance 
F={(rVvy), (ay V x), (x V b1), (Fy V b2)} with B(F) = {b1, b2}. In contrast, 
MaxPre 2 detects that the clauses in Fg are unit and reuses them as blocking 
literals, invoking preprocessing on F = {(7x Vy), (“yV x)} with B(F) = {72, y}. 

In addition to the techniques already implemented in MaxPre 1, MaxPre 
2 includes the following additional techniques: hardening [2], a variant Trim- 
MaxSAT [39] that works on all literals of a MaxSAT instance, the intrinsic 
atmost1 technique [26] and a MaxSAT-lifting of failed literal elimination [12]. In 
short, failed literal elimination adds the clause (~l) to the hard clauses Fy of 
an instance in case unit-propagation derives a conflict in Fy A {(1)}. Addition- 
ally, the implementation of failed literal elimination attempts to identify implied 
equivalences between literals that can lead to further simplification. 

For computing the solutions required by TrimMaxSAT and detecting the car- 
dinality constraints required by intrinsic-at-most-one constraints, MaxPre 2 uses 
the Glucose 3.0 SAT-solver [3]. For computing solutions required by hardening, 
MaxPre 2 additionally uses the SatLike incomplete MaxSAT solver [34] within 
preprocessing. MaxPre 2 is available in open source at https://bitbucket.org/ 
coreo-group/maxpre2/. 

We emphasize that, while the additional techniques implemented by MaxPre 
2 have been previously implemented as heuristics in specific solver implemen- 
tations, MaxPre 2 is—to the best of our understanding—the first stand-alone 
implementation supporting techniques whose correctness cannot be established 
with previously-proposed MaxSAT redundancy notions (i.e., SRAT). The goal 
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of our empirical evaluation presented in the next section is to demonstrate the 
potential of viewing expressive reasoning techniques not only as solver heuristics, 
but as a separate step in the MaxSAT solving process whose correctness can be 
established via propagation redundancy. 


7 Empirical Evaluation 


We report on results from an experimental evaluation of the potential of incor- 
porating more general reasoning in MaxSAT preprocessing. In particular, we 
evaluated both complete solvers (geared towards finding provably-optimal solu- 
tions) and incomplete solvers (geared towards finding relatively good solutions 
fast) on standard heterogenous benchmarks from recent MaxSAT Evaluations. 
All experiments were run on 2.60-GHz Intel Xeon E5-2670 8-core machines with 
64 GB memory and CentOS 7. All reported runtimes include the time used in 
preprocessing (when applicable). 


7.1 Impact of Preprocessing on Complete Solvers 


We start by considering recent representative complete solvers covering three 
central MaxSAT solving paradigms: the core-guided solver CGSS [27] (as a recent 
improvement to the successful RC2 solver [26]), and the MaxSAT Evaluation 
2021 versions of the implicit hitting set based solver MaxHS [17] and the solution- 
improving solver Pacose [40]. For each solver S we consider the following variants. 


— S: S in its default configuration. 

— Sno preprocess: S with the solver’s own internal preprocessing turned off 
(when applicable). 

— S+maxpre1: S after applying MaxPre 1 using its default configuration. 

— $+maxpre2/none: S$ after applying MaxPre 2 using the default configuration 
of MaxPre 1. 

— S+maxpre2/<TECH>: S after applying MaxPre 2 using the standard config- 
uration of MaxPre 1 and additional techniques integrated into MaxPre 2 (as 
detailed in Section 6) as specified by <TECH>. 


More precisely, <TECH> specifies which of the techniques HTVGR are applied: 
H for hardening, T and V for TrimMaxSAT on blocking and non-blocking liter- 
als, respectively, G for intrinsic-at-most-one-constraints and R for failed literal 
elimination. It should be noted that an exhaustive evaluation of all subsets and 
application orders of these techniques is infeasible in practice. Based on prelim- 
inary experiments, we observed that the following choices were promising: HRT 
for CGSS and MaxHS, and HTVGR for Pacose; we report results using these 
individual configurations. 

As benchmarks, we used the combined set of weighted instances from the 
complete tracks of MaxSAT Evaluation 2020 and 2021. After removing dupli- 
cates, this gave a total of 1117 instances. We enforced a per-instance time limit of 
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Fig. 1. Impact of preprocessing on complete solvers. For each solver, the number of 
instances solved within a 60-min per-instance time limit in parentheses. 


60 minutes and memory limit of 32 GB. Furthermore, we enforced a per-instance 
120-second time limit on preprocessing. 

An overview of the results is shown in Fig. 1, illustrating for each solver 
the number of instances solved (x-axis) under different per-instance time lim- 
its (y-axis). We observe that for both CGSS and MaxHS, S+maxpre1 and 
S+maxpre2/none leads to less instances solved compared to S. In contrast, 
S-+maxpre2/HRT, i.e., incorporating the stronger reasoning techniques of Max- 
Pre 2, performs best of all preprocessing variants and improves on MaxHS 
also in terms of the number of instances solved. For Pacose, we observe that 
both Pacose+maxpre1 and Pacose+maxpre2/new (without the stronger reason- 
ing techniques) already improve the performance of Pacose, leading to more 
instances solved. Incorporating the stronger reasoning rules further significantly 
improves performance, with Pacose+maxpre2/HVRTG performing the best among 
all of the Pacose variants. 


7.2 Impact of Preprocessing on Incomplete MaxSAT Solving 


As a representative incomplete MaxSAT solver we consider the MaxSAT Evalua- 
tion 2021 version of Loandra [9], as the best-performing solver in the incomplete 
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Table 1. Impact of preprocessing on the incomplete solver Loandra. The wins are 
organized column-wise, the cell on row X column Y contains the total number of 


instances that the solver on column Y wins over the solver on row X. 
7##Wins base (maxpre1) | no-prepro | maxpre2/ none |maxpre2/ VG 
base (maxpre1) | — 154 135 152 
no-prepro 208 — 216 218 
maxpre2/none |105 143 — 77 
maxpre2/VG 110 140 80 — 
Score (avg): 0.852 0.840 0.863 0.870 

track of MaxSAT Evaluation under a 300s per-instance time limit on weighted 


instances. Loandra combines core-guided and solution-improving search towards 
finding good solutions fast. We consider the following variants of Loandra. 


— base (maxpre1): Loandra in its default configuration which makes use of 
MaxPre 1. 

— no-prepro: Loandra with its internal preprocessing turned off. 

— maxpre2/none: base with its internal preprocessor changed from MaxPre 1 
to MaxPre 2 using the default configuration of MaxPre 1. 

— maxpre2/VG: maxpre2 incorporating the additional intrinsic-at-most-one con- 
straints technique and the extension of TrimMaxSAT to non-blocking literals 
(cf. Sect. 6), found promising in preliminary experimentation. 


As benchmarks, we used the combined set of weighted instances from the 
incomplete tracks of MaxSAT Evaluation 2020 and 2021. After removing dupli- 
cates, this gave a total of 451 instances. When reporting results, we consider for 
each instance and solver the cost of the best solution found by the solver within 
300s (including time spent preprocessing and solution reconstruction). 

We compare the relative runtime performance of the solver variants using 
two metrics: #wins and the average incomplete score. Assume that Ty and Ty 
are the lowest-cost solutions computed by two solvers X and Y on a MaxSAT 
instance F and that best-cost(F) is the lowest cost of a solution of F found 
either in our evaluation or in the MaxSAT Evaluations. Then X wins over Y 
if COST(F, Tx) < COST(F, Ty). The incomplete score, score(F, X), obtained by 
solver X on F is the ratio between the cost of the solution found by X and 
best-cost(F), i.e., score(F, X) = (best-cost(F) + 1)/(COST(F, Tz) +1). The 
score of X on F is 0 if X is unable to find any solutions within 300s. 

An overview of the results is shown in Table 1. The upper part of the table 
shows a pairwise comparison on the number of wins over all benchmarks. The 
wins are organized column-wise, i.e., the cell on row X column Y contains the 
total number of instances that the solver on column Y wins over the solver on 
row X. The last row contains the average score obtained by each solver over 
all instances. We observe that any form of preprocessing improves the perfor- 
mance of Loandra, as witnessed by the fact that no-prepro is clearly the worst- 
performing variant. The variants that make use of MaxPre 2 outperform the 
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Fig. 2. Impact of preprocessing on instance size. 


baseline under both metrics; both maxpre2 no new and maxpre2-w:VG obtain 
a higher average score and win on more instances over base. The comparison 
between maxpre2/none and maxpre2/VG is not as clear. On one hand, the score 
obtained by maxpre2/VG is higher. On the other hand, maxpre2/none wins on 
80 instances over maxpre2/VG and looses on 77. This suggests that the quality 
of solutions computed by maxpre2/VG is on average higher, and that on the 
instances on which maxpre2/none wins the difference is smaller. 


7.3 Impact of Preprocessing on Instance Sizes 


In addition to improved solver runtimes, we note that MaxPre 2 has a positive 
effect on the size of instances (both in terms of the number of variables and 
clauses remaining) when compared to preprocessing with MaxPre 1; see Fig. 2 
for a comparison, with maxpre2/HRT compared to maxpre1 (left) and to original 
instance sizes (right). 


8 Conclusions 


We studied liftings of variants of propagation redundancy from SAT in the con- 
text of maximum satisfiability where—more fine-grained than in SAT—of inter- 
est are reasoning techniques that preserve optimal cost. We showed that CPR, 
the strongest MaxSAT-lifting, allows for changing minimal corrections sets in 
MaxSAT in a controlled way, thereby succinctly expressing MaxSAT reason- 
ing techniques very generally. We also provided a practical MaxSAT preproces- 
sor extended with techniques captured by CPR and showed empirically that 
extended preprocessing has a positive overall impact on a range of MaxSAT 
solvers. Interesting future work includes the development of new CPR-based 
preprocessing rules for MaxSAT capable of significantly affecting the MaxSAT 
solving pipeline both in theory and practice, as well as developing an under- 
standing of the relationship between redundancy notions and the transforma- 
tions performed by MaxSAT solving algorithms. 
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Abstract. The cvc5 SMT solver solves quantifier-free nonlinear real 
arithmetic problems by combining the cylindrical algebraic coverings 
method with incremental linearization in an abstraction-refinement loop. 
The result is a complete algebraic decision procedure that leverages effi- 
cient heuristics for refining candidate models. Furthermore, it can be used 
with quantifiers, integer variables, and in combination with other theo- 
ries. We describe the overall framework, individual solving techniques, 
and a number of implementation details. We demonstrate its effective- 
ness with an evaluation on the SMT-LIB benchmarks. 


Keywords: Satisfiability modulo theories - Nonlinear real arithmetic - 
Abstraction refinement - Cylindrical algebraic coverings 


1 Introduction 


SMT solvers are used as back-end engines for a wide variety of academic and 
industrial applications [2,19,20]. Efficient reasoning in the theory of real arith- 
metic is crucial for many such applications [5,8]. While modern SMT solvers 
have been shown to be quite effective at reasoning about linear real arithmetic 
problems [21,43], nonlinear problems are typically much more difficult. This is 
not surprising, given that the worst-case complexity for deciding the satisfiabil- 
ity of nonlinear real arithmetic formulas is doubly-exponential in the number 
of variables in the formula [15]. Nevertheless, a variety of techniques have been 
proposed and implemented, each attempting to target a class of formulas for 
which reasonable performance can be observed in practice. 


Related Work. All complete decision procedures for nonlinear real arithmetic 
(or the theory of the reals) originate in computer algebra, the most prominent 
being cylindrical algebraic decomposition (CAD) [11]. While alternatives exist 
(6, 25,41], they have not seen much use [27], and CAD-based methods are the only 
sound and complete methods in practical use today. CAD-based methods used 
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in modern SMT solvers include incremental CAD implementations [34,36] and 
cylindrical algebraic coverings [3], both of which are integrated in the traditional 
CDCL(T) framework for SMT [40]. 

In contrast, the NLSAT [30] calculus and the generalized MCSAT [28,39] 
framework provide for a much tighter integration of a conflict-driven CAD-based 
theory solver into a theory-aware core solver. This has been the dominant app- 
roach over the last decade due to its strong performance in practice. However, 
it has the significant disadvantage of being difficult to integrate with CDCL(T)- 
based frameworks for theory combination. 

A number of incomplete techniques are also used by various SMT solvers: 
incremental linearization [9] gradually refines an abstraction of the nonlinear 
formula obtained via a naive linearization by refuting spurious models of the 
abstraction; interval constraint propagation [24,36,45] employs interval arith- 
metic to narrow down the search space; subtropical satisfiability [22] provides 
sufficient linear conditions for nonlinear solutions in the exponent space of the 
polynomials; and virtual substitution [12,31,46] makes use of parametric solu- 
tion formulas for polynomials of bounded degree. Though all of these techniques 
have limitations, each of them is useful for certain subclasses of nonlinear real 
arithmetic or in combination with other techniques. 


Contributions. We present an integration of cylindrical algebraic coverings and 
incremental linearization, implemented in the cvc5 SMT solver. Crucial to the 
success of the integration is an abstraction-refinement loop used to combine the 
two techniques cooperatively. The solution is effective in practice, as witnessed 
by the fact that cvc5 won the nonlinear real arithmetic category of SMT-COMP 
2021 [44], the first time a non-MCSAT-based technique has won since 2013. Our 
integrated technique also has the advantage of being very flexible: in particular, it 
fits into the regular CDCL(T) schema for theory solvers and theory combination, 
it supports (mixed) integer problems, and it can be easily extended using further 
subsolvers that support additional arithmetic operators beyond the scope of 
traditional algebraic routines (e.g., transcendental functions). 


2 Nonlinear Solving Techniques 


The nonlinear arithmetic solver implemented in cvc5 generally follows the 
abstraction-refinement framework introduced by Cimatti et al. [9] and depicted 
in Fig. 1. The input assertions are first checked by the linear arithmetic solver, 
where they are linearized implicitly by treating every application of multipli- 
cation as if it were an arithmetic variable. For example, given input assertions 
z:-y>OA2>1Ay <0, the linear solver treats the expression x-y as a variable. 
It may then find the (spurious) model: x +> 2, y > —1, and z -y > 1. We call 
the candidate model returned by the linear arithmetic solver, where applications 
of multiplication are treated as variables, a linear model. If a linear model does 
not exist, i.e., the input is unsatisfiable according to the linear solver, the linear 
solver generates a conflict that is immediately returned to the CDCL(T) engine. 
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Fig. 1. Structural overview of the nonlinear solver 


When a linear model does exist, we check whether it already satisfies the 
input assertions or try to repair it to do so. We only apply a few very sim- 
ple heuristics for repairs such as updating the value for z in the presence of a 
constraint like z = x- y based on the values of x and y. 

If the model can not be repaired, we refine the abstraction for the linear 
solver [9]. This step constructs lemmas, or conflicts, based on the input asser- 
tions and the linear model, to advance the solving process by blocking either 
the current linear model or the current Boolean model, that is, the propositional 
assignment generated by the SMT solver’s SAT engine. The Boolean model is 
usually eliminated only by the coverings approach, while the incremental lin- 
earization technique generates lemmas with new literals that target the linear 
model, e.g., the lemma z > 0 Ay <0 = z-y <0 in the example above. We next 
describe our implementation of cylindrical algebraic coverings and incremental 
linearization, and how they are combined in cvc5. 


2.1 Cylindrical Algebraic Coverings 


Cylindrical algebraic coverings is a technique recently proposed by Ábrahám et 
al. [3] and is heavily inspired by CAD. While the way the computation proceeds 
is very different from traditional CAD, and instead somewhat similar to NLSAT 
[30], their mathematical underpinnings are essentially identical. The cylindri- 
cal algebraic coverings subsolver in cvc5 closely follows the presentation in [3]. 
Below, we discuss some differences and extensions. For this discussion, we must 
refer the reader to [3] for the relevant background material because of space 
constraints. We note that cvc5 relies on the libpoly library [29] to provide most 
of the computational infrastructure for algebraic reasoning. 


Square-Free Basis. As with most CAD projection schemas, the set of projection 
polynomials needs to be a square-free basis when computing the characterization 
for an interval in [3, Algorithm 4]. However, the resultants computed in this 
algorithm combine polynomials from different sets, which are not necessarily 
coprime. The remedy is to either make these sets of polynomials pairwise square- 
free or to fully factor all projection polynomials. We adopt the former approach. 


Starting Model. Although the linear model may not satisfy the nonlinear con- 
straints, we may expect it to be in the vicinity of a proper model. We thus 
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optionally use the linear model as an initial assignment for the cylindrical alge- 
braic coverings algorithm in one of two ways: either using it initially in the search 
and discarding it as soon as it conflicts; or using it whenever possible, even if 
it leads to a conflict in another branch of the search. Unfortunately, neither 
technique has any discernible impact in our experiments. 


Interval Pruning. As already noted in [3], a covering may contain two kinds 
of redundant intervals: intervals fully contained in another interval, or intervals 
contained in the union of other intervals. Removing the former kind of redundan- 
cies is not only clearly beneficial, but also required for how the characterizations 
are computed. It is not clear, however, if it is worthwhile to remove redundancies 
of the second kind because, while it can simplify the characterization locally, it 
may also make the resulting interval smaller, slowing down the overall solving 
process. Note that there may not be a unique redundant interval: e.g., if multi- 
ple intervals overlap, it may be possible to remove one of two intervals, but not 
both of them. We have implemented a simple heuristic to detect redundancies 
of the second kind, always removing the smallest interval with respect to the 
interval ordering given in [3]. Even if these redundancies occur in about 7.5% 
of all QF_NRA benchmarks, using this technique has only a very limited impact. 
It may be that for certain kinds of benchmarks, underrepresented in SMT-LIB, 
the technique is valuable. Or it may be that some variation of the technique is 
more broadly helpful. These are interesting directions for future work. 


Lifting and Coefficient Selection with Lazard. The original cylindrical algebraic 
coverings technique is based on McCallum’s projection operator [37], which is 
particularly well-studied, but also (refutationally) unsound: polynomial nullifi- 
cation may occur when computing the real roots, possibly leading to the loss of 
real roots and thus solution candidates. One then needs to check for these cases 
and fall back to a more conservative, albeit more costly, projection schema such 
as those due to Collins [11] or Hong [26]. 

Lazard’s projection schema [35], which has been proven correct only recently 
[38], provides very small projection sets and is both sound and complete. This 
comes at the price of a different mathematical background and a modified lifting 
procedure, which corresponds to a modified procedure for real root isolation. 
Although the local projections employed in cylindrical algebraic coverings have 
not been formally verified for Lazard’s projection schema yet, we expect no 
significant issues there. Adopting it seems to be a logical improvement, as already 
mentioned in [3]. The modified real root isolation procedure is a significant hurdle 
in practice, as it requires additional nontrivial algorithms [32, Section 5.3.2]. We 
implemented it using CoCoALib [1] in cvc5 [33], achieving soundness without 
any discernible negative performance impact. 

Using Lazard’s projection schema, for all its benefits, may seem questionable 
for the following reasons: (i) the unsoundness of McCallum’s projection operator 
is virtually never witnessed in practice [32,33, Section 6.5], and (ii) the projection 
sets computed by Lazard’s and McCallums’s projection operator are identical 
on more than 99.5% on all of QF_NRA [33]. We argue, though, that working in 
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the domain of formal verification warrants the effort of obtaining a (provably) 
correct result, especially if it does not incur a performance overhead. 


Proof Generation. Recently, generating formal proofs to certify the result of SMT 
solvers has become an area of focus. In particular, there is a large and ongoing 
effort to produce proofs in cvc5. The incremental linearization approach can be 
seen as an oracle which produces lemmas that are easy to prove individually, so 
cvc5 does generate proofs for them; the complex part is finding those lemmas 
and making sure they actually help the solver make progress. 

The situation is very different for cylindrical algebraic coverings: the pro- 
duced lemma is the infeasible subset, and we usually have no simpler proof than 
the computations relying on CAD theory. That said, cylindrical algebraic cover- 
ings appear to be more amenable to automatic proof generation than traditional 
CAD-based approaches [4,14]. In fact, although making these proofs detailed 
enough for automated verification is still an open problem, they are already bro- 
ken into smaller parts that closely follow the tree-shaped computation of the 
algorithm. This allows cvc5 to produce at least a proof skeleton in that case. 


2.2 Incremental Linearization 


Our theory solver for nonlinear (real) arithmetic optionally uses lemma schemas 
following the incremental linearization approaches described by Cimatti et al. [9] 
and Reynolds et al. [42]. These schemas incrementally refine candidate models 
from the linear arithmetic solver by introducing selected quantifier-free lemmas 
that express properties of multiplication, such as signedness (e.g., x > OA y > 
0 > x-y > 0) or monotonicity (e.g., |z| > |y| > x-x > y-y). They are generated 
as needed to refute spurious models that violate these properties. 

Most lemma schemas built-in in cvc5 are crafted so as to avoid introducing 
new monomial terms or coefficients, since that could lead to non-termination in 
the CDCL(T) search. As a notable exception, we rely on a lemma schema for 
tangent planes for multiplication [9], which can be used to refute the candidate 
model for any application of the multiplication operator - whose value in the 
linear model is inconsistent with the standard interpretation of -. Note that 
since these lemmas depend upon the current model value chosen for arithmetic 
variables, tangent plane lemmas may introduce an unbounded number of new 
literals into the search. The set of lemma schemas used by the solver is user- 
configurable, as described in the following section. 


2.3 Strategy 


The overall theory solver for nonlinear arithmetic is built from several subsolvers, 
implementing the techniques described above, using a rather naive strategy, 
as summarized in Algorithm 1. After a spurious linear model has been con- 
structed that cannot be repaired, we first apply a subset of the lemma schemas 
that do not introduce an unbounded number of new terms (with procedure 
IncLinearizationLight); then, we continue with the remaining lemma schemas 
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1 Function N1Solve (assertions) 

2 if not LinearSolve(assertions) then return linear conflict 

3 M = linear model for assertions 

4 if RepairModel (assertions, M) then return repaired model 

5 if IncLinearizationLight (assertions, M) then return lemmas 
6 if IncLinearizationFull (assertions, M) then return lemmas 
7 return Coverings (assertions, M) 


Algorithm 1: Strategy for nonlinear arithmetic solver 


(with procedure IncLinearizationFull); finally, we resort to the coverings 
solver which is guaranteed to find either a conflict or a model. Internally, each 
procedure sequentially tries its assigned lemma schemas from [9,42] until it con- 
structs a lemma that can block the spurious model. 

The approach is dynamically configured based on input options and the logic 
of the input formula. For example, by default, we disable IncLinearizationFull 
for QF_NRA as it tends to diverge in cases where the coverings solver quickly 
terminates. 


2.4 Beyond QF_NRA 


The presented solver primarily targets quantifier-free nonlinear real arithmetic, 
but is used also in the presence of quantifiers and with multiple theories. 


Quantified Logics. Solving quantified logics for nonlinear arithmetic requires solv- 
ing quantifier-free subproblems, and thus any improvement to quantifier-free 
solving also benefits solving with quantifiers. In practice, however, the instanti- 
ation heuristics are just as important for overall solver performance. 


Multiple Theories. The theory combination framework as implemented in cvc5 
requires evaluating equalities over the combined model. To support this func- 
tionality, real algebraic numbers had to be properly integrated into the entire 
solver; in particular, the ability to compute with these numbers could not be 
local to the cylindrical algebraic coverings module or even the nonlinear solver. 


3 Experimental Results 


We evaluate our implementation within cvc5 (commit id 449dd7e) in comparison 
with other SMT solvers on all 11552 benchmarks in the quantifier-free nonlinear 
real arithmetic (QF_NRA) logic of SMT-LIB. We consider three configurations of 
cvc5, each of which runs a subset of steps from Algorithm 1. All the configura- 
tions run lines 2—4. In addition, cvc5.cov runs line 7, cvc5.inclin runs lines 
5 and 6, and cvc5 runs lines 5 and 7. All experiments were conducted on Intel 
Xeon E5-2637v4 CPUs with a time limit of 20 min and 8 GB memory. 

We compare cvc5 with recent versions of all other SMT solvers that partici- 
pated in the QF_NRA logic of SMT-COMP 2021 [44]: MathSAT 5.6.6 [10], SMT-RAT 
19.10.560 [13], veriT [7] (veriT+raSAT+Redlog), Yices2 2.6.4 [18] (Yices-QS for 
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quantified logics), and z3 4.8.14 [16]. MathSAT employs an abstraction-refinement 
mechanism very similar to the one described in Sect. 2.2; veriT [23] forwards non- 
linear arithmetic problems to the external tools raSAT [45], which uses interval 
constraint propagation, and Redlog/Reduce [17], which focuses on virtual sub- 
stitution and cylindrical algebraic decomposition; SMT-RAT, Yices2, and z3 all 
implement some variant of MCSAT [30]. Note that SMT-RAT also implements the 
cylindrical algebraic coverings approach, but it is less effective than SMT-RAT’s 
adaptation of MCSAT [3]. 


Beyond QF_NRA sat unsat solved 
SNR Yices2 231 3817 4048 
QF_NRA sat unsat solved z3 236 3812 4048 
ae = e. cvc5.cov 236 3809 4045 
cyca S131 5996 10733 cvc5 221 3809 4030 
Yices2 4966 5450 10416 : A 
23 5136 5207 10343 cvc5.inclin 120 3786 3906 
cvc5.cov 5001 5077 10078 QF_UFNRA z3 24 11 35 
SMT-RAT 4828 5038 9866 Yices2 23 11 34 
veriT 4522 5034 9556 cvc5 20 11 31 
MathSAT 3645 5357 9002 cvc5.inclin 12 11 23 
cvc5.inclin 3421 5376 8797 cvc5.cov 2 11 13 

(a) (b) 


Fig. 2. (a) Experiments for QF_NRA (b) Experiments for NRA and QF_UFNRA 


Figure 2a shows that cvc5 significantly outperforms all other QF_NRA 
solvers. Both the coverings approach (cvc5.cov) and the incremental lineariza- 
tion approach (cvc5.inclin) contribute substantially to the overall perfor- 
mance of the unified solver in cvc5, with coverings solving many satisfiable 
instances, and incremental linearization helping on unsatisfiable ones. Even 
though cvc5.inclin closely follows [9], it outperforms MathSAT on unsatisfi- 
able benchmarks, those where cvc5 relies on incremental linearization the most. 

Comparing cvc5 and Yices2 is particularly interesting, as the coverings app- 
roach in cvc5 and the NLSAT solver in Yices2 both rely on libpoly [29], thus 
using the same implementation of algebraic numbers and operations over them. 
Our integration of incremental linearization and algebraic coverings is compat- 
ible with the traditional CDCL(T) framework and outperforms the alternative 
NLSAT approach, which is specially tailored to nonlinear real arithmetic. 

Going beyond QF_NRA, we also evaluate the performance of our solver in 
the context of theory combination (with all 37 benchmarks from QF_UFNRA) and 
quantifiers (with all 4058 benchmarks from NRA). There, cvc5 is a close runner-up 
to Yices2 and z3, thanks to the coverings subsolver which significantly improves 
cvc5’s performance. We conjecture that the remaining gap is due to components 
other than the nonlinear arithmetic solver, such as the solver for equality and 
uninterpreted functions, details of theory combination, or quantifier instantiation 
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heuristics. Interestingly, the sets of unsolved instances in NRA are almost disjoint 
for cvc5. cov, Yices2 and z3, indicating that each tool could solve the remaining 
benchmarks with reasonable extra effort. 


4 Conclusion 


We have presented an approach for solving quantifier-free nonlinear real 
arithmetic problems that combines previous approaches based on incremen- 
tal linearization [9] and cylindrical algebraic coverings [3] into one coherent 
abstraction-refinement loop. The resulting implementation is very effective, out- 
performing other state-of-the-art solver implementations, and integrates seam- 
lessly in the CDCL(T) framework. 

The general approach also applies to integer problems, quantified formulas, 
and instances with multiple theories, and can additionally be used in combina- 
tion with transcendental functions [9] and bitwise conjunction for integers [47]. 
Further evaluations of these combinations are left to future work. 
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Abstract. The propagation redundant (PR) proof system generalizes the resolu- 
tion and resolution asymmetric tautology proof systems used by conflict-driven 
clause learning (CDCL) solvers. PR allows short proofs of unsatisfiability for 
some problems that are difficult for CDCL solvers. Previous attempts to auto- 
mate PR clause learning used hand-crafted heuristics that work well on some 
highly-structured problems. For example, the solver SADICAL incorporates PR 
clause learning into the CDCL loop, but it cannot compete with modern CDCL 
solvers due to its fragile heuristics. We present PRELEARN, a preprocessing tech- 
nique that learns short PR clauses. Adding these clauses to a formula reduces the 
search space that the solver must explore. By performing PR clause learning as 
a preprocessing stage, PR clauses can be found efficiently without sacrificing 
the robustness of modern CDCL solvers. On a large portion of SAT competi- 
tion benchmarks we found that preprocessing with PRELEARN improves solver 
performance. In addition, there were several satisfiable and unsatisfiable formu- 
las that could only be solved after preprocessing with PRELEARN. PRELEARN 
supports proof logging, giving a high level of confidence in the results. 


1 Introduction 


Conflict-driven clause learning (CDCL) [27] is the standard paradigm for solving the 
satisfiability problem (SAT) in propositional logic. CDCL solvers learn clauses implied 
through resolution inferences. Additionally, all competitive CDCL solvers use pre- and 
in-processing techniques captured by the resolution asymmetric tautology (RAT) proof 
system [21]. As examples, the well-studied pigeonhole and mutilated chessboard prob- 
lems are challenging benchmarks with exponentially-sized resolution proofs [1,12]. It 
is possible to construct small hand-crafted proofs for the pigeonhole problem using 
extended resolution (ER) [8], a proof system that allows the introduction of new vari- 
ables [32]. ER can be expressed in RAT but has proved difficult to automate due to the 
large search space. Even with modern inprocessing techniques, many CDCL solvers 
struggle on these seemingly simple problems. The propagation redundant (PR) proof 
system allows short proofs for these problems [14,15], and unlike in ER, no new vari- 
ables are required. This makes PR an attractive candidate for automation. 

At a high level, CDCL solvers make decisions that typically yield an unsatisfiable 
branch of a problem. The clause that prunes the unsatisfiable branch from the search 
space is learned, and the solver continues by searching another branch. PR extends this 
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paradigm by allowing more aggressive pruning. In the PR proof system a branch can 
be pruned as long as there exists another branch that is at least as satisfiable. As an ex- 
ample, consider the mutilated chessboard. The mutilated chessboard problem involves 
finding a covering of 2 x 1 dominos on an n x n chessboard with two opposite cor- 
ners removed (see Section 5.4). Given two horizontally oriented dominoes covering a 
2 x 2 square, two vertically oriented dominos could cover the same 2 x 2 square. For 
any solution that uses the dominos in the horizontal orientation, replacing them with the 
dominos in the vertical orientation would also be a solution. The second orientation is as 
satisfiable as the first, and so the first can be pruned from the search space. Even though 
the number of possible solutions may be reduced, the pruning is satisfiability preserv- 
ing. This is a powerful form of reasoning that can efficiently remove many symmetries 
from the mutilated chessboard, making the problem much easier to solve [15]. 

The satisfaction-driven clause learning (SDCL) solver SADICAL [16] incorporates 
PR clause learning into the CDCL loop. SADICAL implements hand-crafted decision 
heuristics that exploit the canonical structure of the pigeonhole and mutilated chess- 
board problems to find short proofs. However, SADICAL’s performance deteriorates 
under slight variations to the problems including different constraint encodings [7]. 
The heuristics were developed from a few well-understood problems and do not gener- 
alize to other problem classes. Further, the heuristics for PR clause learning are likely 
ill-suited for CDCL, making the solver less robust. 

In this paper, we present PRELEARN, a preprocessing technique for learning PR 
clauses. PRELEARN alternates between finding and learning PR clauses. We develop 
multiple heuristics for finding PR clauses and multiple configurations for learning some 
subset of the found PR clauses. As PR clauses are learned we use failed literal prob- 
ing [11] to find unit clauses implied by the formula. The preprocessing is made efficient 
by taking advantage of the inner/outer solver framework in SADICAL. The learned PR 
clauses are added to the original formula, aggressively pruning the search space in an ef- 
fort to guide CDCL solvers to short proofs. With this method PR clauses can be learned 
without altering the complex heuristics that make CDCL solvers robust. PRELEARN 
focuses on finding short PR clauses and failed literals to effectively reduce the search 
space. This is done with general heuristics that work across a wide range of problems. 

Most SAT solvers support logging proofs of unsatisfiability for independent check- 
ing [17,20,33]. This has proved valuable for verifying solutions independent of a (po- 
tentially buggy) solver. Modern SAT solvers log proofs in the DRAT proof system 
(RAT [21] with deletions). DRAT captures all widely used pre- and in-processing tech- 
niques including bounded variable elimination [10], bounded variable addition [26], 
and extended learning [4,32]. DRAT can express the common symmetry-breaking tech- 
niques, but it is complicated [13]. PR can compactly express some symmetry-breaking 
techniques [14,15], yielding short proofs that can be checked by the proof checker 
DPR-TRIM [16]. PR gives a framework for strong symmetry-breaking inferences and 
also maintains the highly desirable ability to independently verify proofs. 

The contributions of this paper include: (1) giving a high-level algorithm for ex- 
tracting PR clauses, (2) implementing several heuristics for finding and learning PR 
clauses, (3) evaluating the effectiveness of different heuristic configurations, and (4) 
assessing the impact of PRELEARN on solver performance. PRELEARN improves the 
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performance of the CDCL solver KISSAT on a quarter of the satisfiable and unsatisfiable 
competition benchmarks we considered. The improvement is significant for a number 
of instances that can only be solved by KISSAT after preprocessing. Most of them come 
from hard combinatorial problems with small formulas. In addition, PRELEARN di- 
rectly produces refutation proofs for the mutilated chessboard problem containing only 
unit and binary PR clauses. 


2 Preliminaries 


We consider propositional formulas in conjunctive normal form (CNF). A CNF formula 
w is a conjunction of clauses where each clause is a disjunction of literals. A literal l is 
either a variable x (positive literal) or a negated variable 7 (negative literal). For a set 
of literals L the formula 7)(L) is the clauses {C € Y | COL £ 0}. 

An assignment is a mapping from variables to truth values 1 (true) and 0 (false). 
An assignment is total if it assigns every variable to a value, and partial if it assigns a 
subset of variables to values. The set of variables occurring in a formula, assignment, 
or clause is given by var(y), var(a), or var(C). For a literal l, var(/) is a variable. 

An assignment a satisfies a positive (negative) literal | if a maps var(J) to true (a 
maps var(l) to false, respectively), and falsifies it if œ maps var(l) to false (a maps 
var(l) to true, respectively). We write a finite partial assignment as the set of literals it 
satisfies. An assignment satisfies a clause if the clause contains a literal satisfied by the 
assignment. An assignment satisfies a formula if every clause in the formula is satisfied 
by the assignment. A formula is satisfiable if there exists a satisfying assignment, and 
unsatisfiable otherwise. Two formula are logically equivalent if they share the same set 
of satisfying assignments. Two formulas are satisfiability equivalent if they are either 
both satisfiable or both unsatisfiable. 

If an assignment a satisfies a clause C we define C |œ = T, otherwise C |a repre- 
sents the clause C with the literals falsified by a removed. The empty clause is denoted 
by L. The formula y reduced by an assignment a is given by Y|a = {Cla | C € 
wand C |a Æ T}. Given an assignment a = 1, ...In, C = (1, V ++- V In) is the clause 
that blocks a. The assignment blocked by a clause is the negation of the literals in the 
clause. The literals touched by an assignment is defined by touched,(C) = {l | l € 
C and var(l) € var(a)} for a clause. For a formula w, touched, (Y) is the union of 
touched variables for each clause in w. A unit is a clause containing a single literal. 
The unit clause rule takes the assignment a of all units in a formula 7 and generates 
|a. Iteratively applying the unit clause rule until fixpoint is referred to as unit propa- 
gation. In cases where unit propagation yields | we say it derived a conflict. A formula 
w implies a formula Y’, denoted Y% — w”, if every assignment satisfying 7 satisfies y’. 
By w +1 wv’ we denote that for every clause C € w”’, applying unit propagation to the 
assignment blocked by C in ~ derives a conflict. If unit propagation derives a conflict 
on the formula YU {{1}}, we say L is a failed literal and the unit l is logically implied by 
the formula. Failed literal probing [11] is the process of successively assigning literals 
to check if units are implied by the formula. In its simplest form, probing involves as- 
signing a literal / and learning the unit / if unit propagation derives a conflict, otherwise 
l is unassigned and the next literal is checked. 
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To evaluate the satisfiability of a formula, a CDCL solver [27] iteratively performs 
the following operations: First, the solver performs unit propagation, then tests for a 
conflict. Unit propagation is made efficient with two-literal watch pointers [28]. If there 
is no conflict and all variables are assigned, the formula is satisfiable. Otherwise, the 
solver chooses an unassigned variable through a variable decision heuristic [6,25], as- 
signs a truth value to it, and performs unit propagation. If, however, there is a conflict, 
the solver performs conflict analysis potentially learning a short clause. In case this 
clause is the empty clause, the formula is unsatisfiable. 


3 The PR Proof System 


A clause C is redundant w.r.t. a formula w if Y and wU{C} are satisfiability equivalent. 
The clause sequence Y, C1, C2,...,C, is a clausal proof of Cn if each clause C; (1 < 
i < n) is redundant w.rt. Y U {C1, Co,...,C;_1}. The proof is a refutation of w if Cn 
is |. Clausal proof systems may also allow deletion. In a refutation proof clauses can 
be deleted freely because the deletion cannot make a formula less satisfiable. 

Clausal proof systems are distinguished by the kinds of redundant clauses they allow 
to be added. The standard SAT solving paradigm CDCL learns clauses implied through 
resolution. These clauses are logically implied by the formula, and fall under the reverse 
unit propagation (RUP) proof system. The Resolution Asymmetric Tautology (RAT) 
proof system generalizes RUP. All commonly used inprocessing techniques emit DRAT 
proofs. The propagation redundant (PR) proof system generalizes RAT by allowing the 
pruning of branches without loss of satisfaction. 

Let C be a clause in the formula 7 and a the assignment blocked by C’. Then C is 
PR w.r.t. w if and only if there exists an assignment w such that Y|a /1 w|w and w 
satisfies C’. Intuitively, this allows inferences that block a partial assignment a as long 
as another assignment w is as satisfiable. This means every assignment containing a 
that satisfies 7 can be transformed to an assignment containing w that satisfies w. 

Clausal proofs systems must be checkable in polynomial time to be useful in prac- 
tice. RUP and RAT are efficiently checkable due to unit propagation. In general, deter- 
mining if a clause is PR is an NP-complete problem [18]. However, a PR proof is check- 
able in polynomial time if the witness assignments w are included. A clausal proof with 
witnesses will look like Y, (C1, w1), (C2, w2), . . - , (Cn, wn). The proof checker DPR- 
TRIM can efficiently check PR proofs that include witnesses. Further, DPR-TRIM can 
emit proofs in the LPR format. They can be validated by the formally-verified checker 
CAKE-LPR [31], which was used to validate results in recent SAT competitions. 


4 Pruning Predicates and SADICAL 


Determining if a clause is PR is NP-complete and can naturally be formulated in SAT. 
Given a clause C and formula y, a pruning predicate is a formula such that if it is 
satisfiable, the clause C is redundant w.r.t. Y. SADICAL uses two pruning predicates 
to determine if a clause is PR: positive reduct and filtered positive reduct. If either 
predicate is satisfiable, the satisfying assignment serves as the witness showing the 
clause is PR. 
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Given a formula ~ and assignment a, the positive reduct is the formula G A C 
where C is the clause that blocks a and G = {toucheda (D) | D € y and D|a = T}. 
If the positive reduct is satisfiable, the clause C is PR w.r.t. Y. The positive reduct is 
satisfiable iff the clause blocked by a is a set-blocked clause [23]. 

Given a formula ~ and assignment a, the filtered positive reductis the formula GAC 
where C is the clause that blocks a and G = {touched,(D) | D € wand Dia i 
touched,(D)}. If the filtered positive reduct is satisfiable, the clause C is PR w.r.t. 2. 
The filtered positive reduct is a subset of the positive reduct and is satisfiable iff the 
clause blocked by a is a set-propagation redundant clause [14]. Example 1 shows a 
formula for which the positive and filtered positive reducts are different, and only the 
filtered positive reduct is satisfiable. 


Example 1. Given the formula (xı V 2) A (%1 V £2), the positive reduct with a = x, 
is (x1) A (%1), which is unsatisfiable. The clause (x1) can be filtered, giving the filtered 
positive reduct (z1), which is satisfiable. 


SADICAL [16] uses satisfaction-driven clause learning (SDCL) that extends CDCL 
by learning PR clauses [18] based on (filtered) positive reducts. SADICAL uses an in- 
ner/outer solver framework. The outer solver attempts to solve the SAT problem with 
SDCL. SDCL diverges from the basic CDCL loop when unit propagation after a deci- 
sion does not derive a conflict. In this case a reduct is generated using the current as- 
signment, and the inner solver attempts to solve the reduct using CDCL. If the reduct is 
satisfiable, the PR clause blocking the current assignment is learned, and the SDCL loop 
continues. The PR clause can be simplified by removing all non-decision variables from 
the assignment. SADICAL emits PR proofs by logging the satisfying assignment of the 
reduct as the witness, and these proofs are verified with DPR-TRIM. The key to SADI- 
CAL finding good PR clauses leading to short proofs is the decision heuristic, because 
variable selection builds the candidate PR clauses. Hand-crafted decision heuristics en- 
able SADICAL to find short proofs on pigeonhole and mutilated chessboard problems. 
However, these heuristics differ significantly from the score-based heuristics in most 
CDCL solvers. Our experiences with SaCiDaL suggest that improving the heuristics 
for SDCL reduces the performance of CDCL and the other way around. This may ex- 
plain why SADICAL performs worse than standard CDCL solvers on the majority of 
the SAT competition benchmarks. While SADICAL integrates finding PR clauses of 
arbitrary size in the main search loop, our tool focuses on learning short PR clauses as 
a preprocessing step. This allows us to develop good heuristics for PR learning without 
compromising the main search loop. 


5 Extracting PR Clauses 


The goal of PRELEARN is to find useful PR clauses that improve the performance of 
CDCL solvers on both satisfiable and unsatisfiable instances. Figure | shows how a 
SAT problem is solved using PRELEARN. For some preset time limit, PR clauses are 
found and then added to the original formula. Interleaved in this process is failed literal 
probing to check if unit clauses can be learned. When the preprocessing stage ends, 
the new formula that includes learned PR clauses is solved by a CDCL solver. If the 
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PR Clauses | | 
CNF ——> PRELEARN CDCL H Proof Checker -—> 
x DRAT Proof — 
PR Proof 


Fig. 1. Solving a formula with PRELEARN and a CDCL solver. 


formula is satisfiable, the solver will produce a satisfying assignment. If the formula is 
unsatisfiable, a refutation proof of the original formula can be computed by combining 
the satisfaction preserving proof from PRELEARN and the refutation proof emitted by 
the CDCL solver. The complete proof can be verified with DPR-TRIM. 

PRELEARN alternates between finding PR clauses and learning PR clauses. Candi- 
date PR clauses are found by iterating over each variable in the formula, and for each 
variable constructing clauses that include that variable. To determine if a clause is PR, 
the positive reduct generated by that clause is solved. It can be costly to generate and 
solve many positive reducts, so heuristics are used to find candidate clauses that are 
more likely to be PR. It is possible to find multiple PR clauses that conflict with each 
other. PR clauses are conflicting if adding one of the PR clauses to the formula makes 
the other no longer PR. Learning PR clauses involves selecting PR clauses that are non- 
conflicting. The selection may maximize the number of PR clauses learned or optimize 
for some other metric. Adding PR clauses and units derived from probing may cause 
new clauses to become PR, so the entire process is iterated multiple times. 


5.1 Finding PR Clauses 


PR clauses are found by constructing a set of candidate clauses and solving the positive 
reduct generated by each clause. In SADICAL the candidates are the clauses blocking 
the partial assignment of the solver after each decision in the SDCL loop that does 
not derive a conflict. In effect, candidates are constructed using the solver’s variable 
decision heuristic. We take a more general approach, constructing sets of candidates for 
each variable based on unit propagation and the partial assignment’s neighbors. 

For a variable x, neighbors(a) denotes the set of variables occurring in clauses 
containing literal x or 7, excluding variable x. For a partial assignment a, neighbors(q) 
denotes U,cvar(a) Neighbors(x) \ var(a). Candidate clauses for a literal / are generated 
in the following way: 


— Let a be the partial assignment found by unit propagation starting with the assign- 
ment that makes / true. E F 
— Generate the candidate PR clauses {(} V y), (l V y) | y € neighbors(a)}. 


Example 2 shows how candidate binary clauses are constructed using both polarities 
of an initial variable x. In Example 3 the depth is expanded to reach more variables and 
create larger sets of candidate clauses. The depth parameter is used in Section 5.4. 
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Example 2. Consider the following formula: (xı V T2) A (Z1 V v3) A (z1 V z4 V z5) A 
(£2 V z6 V £7) A (£3 V &7 V ag) A (zs V 29), 

Case 1: We start with var(z,) = 1 and perform unit propagation resulting in a = 
{x1x3}. Observe that neighbors(a) = {£2, £4, £5, £7, £8}. The generated candidate 
clauses are (Z1 V £2), (T1 V T2), (Z1 V z4), (Z1 V T4), ... , (T1 V z8), (Fi V Ze). 
Case 2: We start with var(xı) = 0 and perform unit propagation resulting in a = 
{T1T2}. Observe that neighbors(a) = {£3, £4, £5, £6, £7}. The generated candidate 
clauses are (x1 V £3), (£1 V Z3), (£1 V z4), (£1 V Z4), ... , (£1 V £7), (£1 V T7). 


Example 3. Take the formula from Example 2 and assignment of var(xı) = 1 as in 
case 1. The set of candidate clauses can be expanded by also considering the unas- 
signed neighbors of the variables in neighbors(a). For example, neighbors(x8) = 
{x3, £7, £9}, of which xg is new and unassigned. This adds (z1 V x9) and (z1 V Tg) to 
the set of candidate clauses. This can be iterated by including neighbors of new unas- 
signed variables from the prior step. 


We consider both polarities when constructing candidates for a variable. After all 
candidates for a variable are constructed, the positive reduct for each candidate is gen- 
erated and solved in order. Note that propagated literals appearing in the partial assign- 
ment do not appear in the PR clause. The satisfying assignment is stored as the witness 
and the PR clause may be learned immediately depending on the learning configuration. 

This process is naturally extended to ternary clauses. The binary candidates are gen- 
erated, and for each candidate (x V y), x and y are assigned to false in the first step. The 
variables z € neighbors(a) yield clauses (x V y V z) and (x V y V Z). This approach can 
generate many candidate ternary clauses depending on the connectivity of the formula 
since each candidate binary clause is expanded. A filtering operation would be useful to 
avoid the blow-up in number of candidates. There are likely diminishing returns when 
searching for larger PR clauses because (1) there are more possible candidates, (2) the 
positive reducts are likely larger, and (3) each clause blocks less of the search space. 
We consider only unit and binary candidate clauses in our main evaluation. 

Ideally, we should construct candidate clauses that are likely PR to reduce the num- 
ber of failed reducts generated. Note, the (filtered) positive reduct can only be satisfiable 
if given the partial assignment there exists a reduced, satisfied clause. By focusing on 
neighbors, we guarantee that such a clause exists. The reduced heuristic in SADICAL 
finds variables in all reduced but unsatisfied clauses. The idea behind this heuristic is 
to direct the assignment towards conditional autarkies that imply a satisfiable positive 
reduct [18]. The neighbors approach generalizes this to variables in all reduced clauses 
whether or not they are unsatisfied. A comparison can be found in our repository. 


5.2 Learning PR Clauses 


Given multiple clauses that are PR w.r.t. the same formula, it is possible that some of 
the clauses conflict with each other and cannot be learned simultaneously. Example 4 
shows how learning one PR clause may invalidate the witness of another PR clause. It 
may be that a different witness exists, but finding it requires regenerating the positive 
reduct to include the learned PR clause and solving it. The simplest way to avoid con- 
flicting PR clause is to learn PR clauses as they are found. When a reduct is satisfiable, 
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the PR clauses is added to the formula and logged with its witness in the proof. Then 
subsequent reducts will be generated from the formula including all added PR clauses. 
Therefore, a satisfiable reduct ensures a PR clause can be learned. 

Alternatively, clauses can be found in batches, then a subset of nonconflicting clauses 
can be learned. The set of conflicts between PR clauses can be computed in polynomial 
time. For each pair of PR clauses C and D, if the assignment that generated the pruning 
predicate for D touches C and C is not satisfied by the witness of D, then C con- 
flicts with D. In some cases reordering the two PR clauses may avoid a conflict. In 
Example 4 learning the second clause would not affect the validity of the first clauses’ 
witness. Once the conflicts are known, clauses can be learned based on some heuristic 
ordering. Batch learning configurations are discussed more in the following section. 


Example 4. Assume the following clause witness pairs are valid in a formula w: { (x1 V 
£2 V £3), £1273}, and {(x1 V T2 V x4), 1 F224}. The first clause conflicts with the 
second. If the first clause is added to w, the clause (xı V x2) would be in the positive 
reduct for the second clause, but it is not satisfied by the witness of the second clause. 


5.3 Additional Configurations 


The sections above describe the PRELEARN configuration used in the main evaluation, 
i.e., finding candidate PR clauses with the neighbors heuristic and learning clauses in- 
stantly as the positive reducts are solved. In this section we present several additional 
configurations. The time-constrained reader may skip ahead to Section 5.4 for the pre- 
sentation of our main results. 

In batch learning a set of PR clauses are found in batches then learned. Learning as 
many nonconflicting clauses as possible coincides with the maximum independent set 
problem. This problem is NP-Hard. We approximate the solution by adding the clause 
causing the fewest conflicts with unblocked clauses. When a clause is added, the clauses 
it blocks are removed from the batch and conflict counts are recalculated Alternatively, 
clauses can be added in a random order. Random ordering requires less computation at 
the cost of potentially fewer learned PR clauses. 

The neighbors heuristic for constructing candidate clauses can be modified to in- 
clude a depth parameter. neighbors(z) indicates the number of iterations expanding the 
variables. For example, neighbors(2) expands on the variables in neighbors(1), seen in 
Example 3. We also implement the reduced heuristic, shown in Example 5. Detailed 
evaluations and comparisons can be found in our repository. In general, we found that 
the additional configurations did not improve on our main configuration. More work 
needs to be done to determine when and how to apply these additional configurations. 


Example 5. Given the set of clauses (x1 V £2 V £3) A (T1 V £3 V x4) A (z3 V £5), and 
initial assignment œ = 21, only the second clause is reduced and not satisfied, giving 
reduced(a) = {z£3, x4} and candidate clauses (T1 V x3), (T1 V x4), (Z1 Vz3), (zı VT4). 


5.4 Implementation 


PRELEARN was implemented using the inner/outer-solver framework in SADICAL. 
The inner solver acts the same as in SADICAL, solving pruning predicates using CDCL. 
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The outer solver is not used for SDCL, but the SDCL data-structures are used to find 
and learn PR clauses. The outer solver is initialized with the original formula and main- 
tains the list of variables, clauses, and watch pointers. By default, the outer solver has 
no variables assigned other than learned units. When finding candidates, the variables 
in the partial clause are assigned in the outer solver. Unit propagation makes it possible 
to find all reduced clauses in the formula with a single pass. This is necessary for con- 
structing the positive reduct. After a candidate clause has been assigned and the positive 
reduct solved, the variables are unassigned. This returns the outer solver to the top-level 
before examining the next candidate. When a PR clause is learned, it is added to the 
formula along with its watch pointers. Additionally, failed literals are found if assign- 
ing a variable at the top-level causes a conflict through unit propagation. The negation 
of a failed literal is a unit that can be added to the formula. 

In a single iteration each variable in the formula is processed in a breadth-first search 
(BFS) starting from the first variable in the numbering. When a variable is encountered 
it is first checked whether either assignment of the variable is a failed literal or a unit PR 
clause. If not, binary candidates are generated based on the selected heuristic and PR 
clauses are learned based on the learning configuration. Variables are added to the fron- 
tier of the BFS as they are encountered during candidate clause generation, but they are 
not repeated. Optionally, after all variables have been encountered the BFS restarts, now 
constructing ternary candidates. The repetition continues to the desired clause length. 
Then another iteration begins again with binary clauses. Running PRELEARN multi- 
ple times is important because adding PR clauses in one iteration may allow additional 
clauses to be added in the next. 


6 Mutilated Chessboard 


The mutilated chessboard is ann x n grid of alternating black and white squares with 
two opposite corners removed. The problem is whether or not the the board can be cov- 
ered with 2 x 1 dominoes. This can be encoded in CNF by using variables to represent 
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Fig. 2. Occurrences of two horizontal dominoes may be replaced by two vertical dominos in a 
solution. Similarly, occurrences of a horizontal domino atop two vertical dominos can be replaced 
by shifting the horizontal domino down. 
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Units and Binary PR Clauses Learned per Execution for N = 20 
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Fig. 3. Unit and binary PR clauses learned each execution (red-dotted line) until a contradiction 
was found. Markers on binary PR lines represent an iteration within an execution. 


domino placements on the board. At-most-one constraints (using the pairwise encod- 
ings) say only one domino can cover each square, and at-least-one constraints (using a 
disjunction) say some domino must cover each square. 

In recent SAT competitions, no proof-generating SAT solver could deal with in- 
stances larger than N = 18. In ongoing work, we found refutation proofs that contain 
only units and binary PR clauses for some boards of size N < 30. PRELEARN can be 
modified to automatically find proofs of this type. Running iterations of PRELEARN un- 
til saturation, meaning no new binary PR clauses or units can be found, yields some set 
of units and binary PR clauses. Removing the binary PR clauses from the formula and 
rerunning PRELEARN will yield additional units and a new set of binary PR clauses. 
Repeating the process of removing binary PR clauses and keeping units will eventually 
derive the empty clause for this problem. Figure 3 gives detailed values for N = 20. 
Within each execution (red dotted lines) there are at most 10 iterations (red tick mark- 
ers), and each iteration learns some set of binary PR clauses (red). Some executions 
saturate binary PR clauses before the tenth iteration and exit early. At the end of each 
execution the binary PR clauses are deleted, but the units (blue) are kept for the follow- 
ing execution. A complete DPR proof (PR with deletion) can be constructed by adding 
deletion information for the binary PR clauses removed between each execution when 
concatenating the PRELEARN proofs. The approach works for mutilated chess because 
in each execution there are many binary PR clauses that can be learned and will lead 
to units, but they are mutually exclusive and cannot be learned simultaneously. Further, 
adding units allows new binary PR clauses to be learned in following executions. 

Table 1 shows the statistics for PRELEARN. Achieving these results required some 
modifications to the configuration of PRELEARN. First, notice in Figure 2 the PR 
clauses that can be learned involve blocking one domino orientation that can be re- 
placed by a symmetric orientation. To optimize for these types of PR clauses, we only 
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Table 1. Statistics running multiple executions of PRELEARN on the mutilated chessboard prob- 
lem with the configurations described below. Total units includes failed literals and learned PR 
units. The average units and average binary PR clauses learned during each execution (Exe.) are 
shown as well. 


N Time(s) #Exe. Avg.(s) Total Units Total Bin. Avg. Units Avg. Bin. 


8 0.14 1 0.14 30 164 30.00 164.00 
12 4.94 ik 4.94 103 1,045 103.00 1,045.00 
16 62.47 2 31.23 195 3,988 97.50 1,994.00 
20 513.12 6 85.52 339 1,4470 56.50 2,411.67 


24 4,941.38 26 190.05 512 64,038 19.69 2,463.00 


constructed candidates where the first literal was negative. The neighbors heuristic had 
to be increased to a depth of 6, meaning more candidates were generated for each vari- 
able. Intuitively, the proof is constructed by adding binary PR clauses in order to find 
negative units (dominos that cannot be placed) around the borders of the board. Follow- 
ing iterations build more units inwards, until a point is reached where units cover almost 
the entire board. This forces an impossible domino placement leading to a contradic- 
tion. Complete proofs using only units and binary PR clauses were found for boards 
up to size N = 24 within 5,000 seconds. We verified all proofs using DPR-TRIM. The 
mutilated chessboard has a high degree of symmetry and structure, making it suitable 
for this approach. For most problems it is not expected that multiple executions while 
keeping learned units will find new PR clauses. 

Experiments were done with several configurations (see Section 5.3) to find the best 
results. We found that increasing the depth of neighbors was necessary for larger boards 
including N = 24. Increasing the depth allows more binary PR clauses to be found, at 
the cost of generating more reducts. This is necessary to find units. The reduced heuris- 
tic (a subset of neighbors) did not yield complete proofs. We also tried incrementing 
the depth after each execution starting with 1 and reseting at 9. In this approach, the 
execution times for depth greater than 6 were larger but did not yield more unit clauses 
on average. We attempted batch learning on every 500 found clauses using either ran- 
dom or the sorted heuristic. In each batch many of the 500 PR clauses blocked each 
other because many conflicting PR clauses can be found on a small set of variables in 
mutilated chess. The PR clauses that were blocked would be found again in follow- 
ing iterations, leading to more reducts generated and solved. This caused much longer 
execution times. Adding PR clauses instantly is a good configuration for reducing exe- 
cution time when there are many conflicting clauses. However, for some less symmetric 
problems it may be worth the tradeoff to learn the clauses in batches, because learning 
a few bad PR clauses may disrupt the subsequent iterations. 


7 SAT Competition Benchmarks 


We evaluated PRELEARN on previous SAT competition formulas. Formulas from the 
°13, °15, °16, °19, °20, and °21 SAT competitions’ main tracks were grouped by size. 
0-10k contains the 323 formulas with less than 10,000 clauses and 10k-50k contains 
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Table 2. Fraction of benchmarks where PR clauses were learned, average runtime of PRELEARN, 
generated positive reducts and satisfiable positive reducts (PR clauses learned), and number of 
failed literals found. 


Set Benches Avg.(s) Generated Reducts Sat. Reducts % Sat. Failed Lits 


0-10k 221/323 22.36 104,850,011 548,417 0.52% 3,416 
10k-50k 237/348 71.08 163,014,068 789,281 0.48% 6,290 


the 348 formulas with between 10,000 and 50,000 clauses. In general, short PR proofs 
have been found for hard combinatorial problems typically having few clauses (0-10k). 
These include the pigeonhole and mutilated chessboard problems, some of which ap- 
pear in 0-10k benchmarks. The PR clauses that can be derived for these formulas are 
intuitive and almost always beneficial to solvers. Less is known about the impact of PR 
clauses on larger formulas, motivating our separation of test sets by size. The repository 
containing the preprocessing tool, experiment configurations, and experiment data can 
be found at https://github.com/jreeves3/PReLearn. 

We ran our experiments on StarExec [30]. The specs for the compute nodes can be 
found online.! The compute nodes that ran our experiments were Intel Xeon E5 cores 
with 2.4 GHz, and all experiments ran with 64 GB of memory and a 5,000 second 
timeout. We run PRELEARN for 50 iterations over 100 seconds, exiting early if no new 
PR clauses were found in an iteration. 

PRELEARN was executed as a stand-alone program, producing a derivation proof 
and a modified CNF. For experiments, the CDCL solver KISSAT [5] was called once on 
the original formula and once on the modified CNF. KISSAT was selected because of 
its high-rankings in previous SAT competitions, but we expect the results to generalize 
to other CDCL SAT solvers. 

Derivation proofs from PRELEARN were verified in all solved instances using the 
independent proof checker DPR-TRIM using a forward check. This can be extended to 
complete proofs in the following way. In the unsatisfiable case the proof for the learned 
PR clauses is concatenated to the proof traced by KISSAT, and the complete proof is 
verified against the original formula. In the satisfiable case the partial proof for the 
learned PR clauses is verified using a forward check in DPR-TRIM, and the satisfying 
assignment found by KISSAT is verified by the StarExec post-processing tool. Due to 
resource limitations, we verified a subset of complete proofs in DPR-TRIM. This is 
more costly because it involves running KISSAT with proof logging, then running DPR- 
TRIM on the complete proof. 

Table 2 shows the cumulative statistics for running PRELEARN on the benchmark 
sets. Note the number of satisfiable reducts is the number of learned PR clauses, because 
PR clauses are learned immediately after the reduct is solved. These include both unit 
and binary PR clauses. A very small percentage of generated reducts is satisfiable, and 
subsequently learned. This is less important for small formulas when reducts can be 
computed quickly and there are fewer candidates to consider. However, for the 10k-50k 
formulas the average runtime more than triples but the number of generated reducts 


! https://starexec.org/starexec/public/about.jsp 
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Table 3. Number of total solved instances and exclusive solved instances running KISSAT with 
and without PRELEARN. Number of improved instances running KISSAT with PRELEARN. 


PRELEARN execution times were included in total execution times. 


0-10k SAT 0-10k UNSAT 10k-50k SAT 10k-50k UNSAT 


Total w/ PRELEARN 84 149 143 89 
Total w/o PRELEARN 80 141 143 91 
Exclusively w/ PRELEARN 4 10 4 1 
Exclusively w/o PRELEARN 0 2 4 3 
Improved w/ PRELEARN 20 44 25 13 


less than doubles. PR clauses are found in about two thirds of the formulas, showing 
our approach generalizes beyond the canonical problems for which we knew PR clauses 
existed. Expanding the exploration and increasing the time limit did not help to find PR 
clauses in the remaining one third. 

Table 3 gives a high-level picture of PRELEARN’s impact on KISSAT. PRELEARN 
significantly improves performance on 0-10k SAT and UNSAT benchmarks. These 
contain the hard combinatorial problems including pigeonhole that PRELEARN was 
expected to perform well on. There were 4 additional SAT formulas solved with PRE- 
LEARN that KISSAT alone could not solve. This shows that PRELEARN impacts not 
only hard unsatisfiable problems but satisfiable problems as well. On the other hand, 
the addition of PR clauses makes some problems more difficult. This is clear with the 
10k-50k results, where 5 benchmarks are solved exclusively with PRELEARN and 7 are 
solved exclusively without. Additionally, PRELEARN improved KISSAT’s performance 
on 102 of 671 or approx. 15% of benchmarks. This is a large portion of benchmarks, 
both SAT and UNSAT, for which PRELEARN is helpful. 

Figure 4 gives a more detailed picture on the impact of PRELEARN per benchmark. 
In the scatter plot the left-hand end of each line indicates the KISSAT execution time, 
while the length of the line indicates the PRELEARN execution time, and so the right- 
hand end gives the total time for PRELEARN plus KISSAT. Lines that cross the diagonal 
indicate that the preprocessing improved KISSAT’s performance but ran for longer than 
the improvement. PRELEARN improved performance for points above the diagonal. 
Points on the dotted-lines (timeout) are solved by one configuration and not the other. 

The top plot gives the results for the 0-10k formulas, with many points on the top 
timeout line as expected. These are the hard combinatorial problems that can only be 
solved with PRELEARN. In general, the unsatisfiable formulas benefit more than the 
satisfiable formulas. PR clauses can reduce the number of solutions in a formula and 
this may explain the negative impact on many satisfiable formulas. However, there are 
still some satisfiable formulas that are only solved with PRELEARN. 

In the bottom plot, formulas that take a long time to solve (above the diagonal in the 
upper right-hand corner) are helped more by PRELEARN. In the bottom half of the plot, 
many lines cross the diagonal meaning the addition of PR clauses provided a negligible 
benefit. For this set there are more satisfiable formulas for which PRELEARN is helpful. 
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Fig. 4. Execution times w/ and w/o PRELEARN on 0-10k (top) and 10k-50k (bottom) bench- 
marks. The left-hand point of each segment shows the time for the SAT solver alone; the right- 
hand point indicates the combined time for preprocessing and solving. 
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Table 4. Some formulas solved by KISSAT exclusively with PRELEARN (top) and some formulas 
solved exclusively without PRELEARN (bottom). (*) solved without KISSAT. Clauses include PR 


clauses and failed literals learned. 


Set Value With Without Clauses Formula Year 


0-10k UNSAT 1.26 - 2,033 phi2* 2013 
0-10k UNSAT 35.69 — 20,179 Pb-chnl15-16_c18* 2019 
0-10k UNSAT 105.01 — 46,759 Pb-chn120-21_c18 2019 
0-10k UNSAT 59.99 - 1,633 randomG-Mix-n17-d05 2021 
0-10k UNSAT 61.08 - 1,472 randomG-n17-d05 2021 
0-10k UNSAT 407.51 - 1,640 randomG-n18-d05 2021 
0-10k UNSAT 584.95 - 1,706 randomG-Mix-n18-d05 2021 
0-10k SAT 1,082.62 - 9,650 fsf-300-354-2-2-3-2.23.opt 2013 
0-10k SAT 1,250.82 — 10,058 fsf-300-354-2-2-3-2.46.opt 2013 
10k-50k SAT 1,076.34 - 804 sp5-26-19-bin-stri-flat-noid 2021 
10k-50k SAT 608.48 — 901 sp5-26-19-una-nons-tree-noid 2021 
10k-50k SAT — 22.99 254 Ptn-7824-b13 2016 
10k-50k SAT - 549.27 133 Ptn-7824-b09 2016 
10k-50k SAT - 1,246.42 39 Ptn-7824-b02 2016 
10k-50k SAT - 1,290.49 121 Ptn-7824-b08 2016 
10k-50k UNSAT — 3,650.21 31,860 rphp4_110_shuffled 2016 
10k-50k UNSAT — 4,273.88 31,531 rphp4_115_shuffled 2016 


The results in Figure 4 are encouraging, with many formulas significantly benefit- 
ting from PRELEARN. PRELEARN improves the performance on both SAT and UN- 
SAT formulas of varying size and difficulty. In addition, lines that cross the diagonal 
imply that improving the runtime efficiency of PRELEARN alone would produce more 
improved instances. For future work, it would be beneficial to classify formulas before 
running PRELEARN. There may exist general properties of a formula that signal when 
PRELEARN will be useful and when PRELEARN will be harmful to a CDCL solver. 
For instance, a formula’s community structure [2] may help focus the search to parts of 
the formula where PR clauses are beneficial. 


7.1 Benchmark Families 


In this section we analyze benchmark families that PRELEARN had the greatest positive 
(negative) effect on, found in Table 4. Studying the formulas PRELEARN works well 
on may reveal better heuristics for finding good PR clauses. 

It has been shown that PR works well for hard combinatorial problems based on 
perfect matchings [14,15]. The perfect matching benchmarks (randomG) [7] are a gen- 
eralization of the pigeonhole (php) and mutilated chessboard problems with varying 
at-most-one encodings and edge densities. The binary PR clauses can be intuitively 
understood as blocking two edges from the perfect matching if there exists two other 
edges that match the same nodes. These benchmarks are relatively small but extremely 
hard for CDCL solvers. Symmetry-breaking with PR clauses greatly reduces the search 
space and leads KISSAT to a short proof of unsatisfiability. PRELEARN also benefits 
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other hard combinatorial problems that use pseudo-Boolean constraints. The pseudo- 
Boolean (Pb-chnl) [24] benchmarks are based on at-most-one constraints (using the 
pairwise encoding) and at-least-one constraints. These formulas have a similar graph- 
ical structure to the perfect matching benchmarks. Binary PR clauses block two edges 
when another set of edges exists that are incident to the same nodes. 

For the other two benchmark families that benefited from PRELEARN, the intuition 
behind PR learning is less clear. The fixed-shape random formulas (fsf) [29] are pa- 
rameterized non-clausal random formulas built from hyper-clauses. The SAT encoding 
makes use of the Plaisted-Greenbaum transformation, introducing circuit-like structure 
to the problem. The superpermutation problem (sp) [22] asks whether a sequence of 
digits 1-n of length / can contain every permutation of [1, n] as a subsequence, and the 
optimization variant asks for the smallest such / given n. The sequence of / digits is en- 
coded directly and passed through a multi-layered circuit that checks for the existence 
of each individual permutation. Digits use the binary (bin) or unary (una) encoding, are 
strict stri if clauses constrain digit bits to valid encodings and nonstrict nons otherwise, 
and flat if the circuit is a large AND or tree for prefix recognizing nested circuits. The 
formulas given ask to find a prefix of a superpermutation for n = 5 or length 26 with 19 
permutations. The check for 19 permutations was encoded as cardinality constraints in 
a pseudo-Boolean instance, then converted back to SAT. Each individual permutation 
is checked by duplicating circuits at each possible starting position of the permutation 
in l. PR clauses may be pruning certain starting positions for some permutations or 
affecting the pseudo-Boolean constraints. This cannot be determined without a deeper 
knowledge of the benchmark generator. 

The relativized pigeonhole problem (rphp) [3] involves placing k pigeons in k — 
1 holes with n nesting places. This problem has polynomial hardness for resolution, 
unlike the exponential hardness of the classical pigeonhole problem. The symmetry- 
breaking preprocessor BREAKID [9] generates symmetry-breaking formulas for rphp 
that are easy for a CDCL solver. PRELEARN can learn many PR clauses but the formula 
does not become easier. Note PRELEARN can solve the php with n = 12 ina second. 

One problem is clause and variable permuting (a.k.a. shuffling). The mutilated 
chessboard problem can still be solved by PRELEARN after permuting variables and 
clauses. The pigeonhole problem can be solved after permuting clauses but not after 
permuting variable names. In PRELEARN, PR candidates are sorted by variable name 
independent of clause ordering, but when the variable names change the order of learned 
clauses changes. In the mutilated chessboard problem there is local structure, so simi- 
lar PR clauses are learned under variable renaming. In the pigeonhole problem there is 
global structure, so a variable renaming can significantly change the binary PR clauses 
learned and cause earlier saturation with far fewer units. 

Another problem is that the addition of PR clauses can change the existing structure 
of a formula and negatively affect CDCL heuristics. The Pythagorean Triples Problem 
(Ptn) [19] asks whether monochromatic solutions of the equation a? + b? = c? can be 
avoided. The formulas encode numbers {1,..., 7824}, for which a valid 2-coloring is 
possible. In the namings, the N in bN denotes the number of backbone literals added 
to the formula. A backbone literal is a literal assigned true in every solution. Adding 
more than 20 backbone literals makes the problem easy. For each formula KISSAT can 
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find a satisfying assignment, but timeouts with the addition of PR clauses. For one 
instance, adding only 39 PR clauses will lead to a timeout. In some hard SAT and 
UNSAT problems solvers require some amount of luck and adding a few clauses or 
shuffling a formula can cause a CDCL solver’s performance to sharply decrease. The 
Pythagorean Triples problem was originally solved with a local search solver, and local 
search still performs well after adding PR clauses. 

In a straight-forward way, one can avoid the negative effects of adding harmful PR 
clauses by running two solvers in parallel: one with PRELEARN and one without. This 
fits with the portfolio approach for solving SAT problems. 


8 Conclusion and Future Work 


In this paper we presented PRELEARN, a tool built from the SADICAL framework 
that learns PR clauses in a preprocessing stage. We developed several heuristics for 
finding PR clauses and multiple configurations for clause learning. In the evaluation we 
found that PRELEARN improves the performance of the CDCL solver KISSAT on many 
benchmarks from past SAT competitions. 

For future work, quantifying the usefulness of each PR clause in relation to guid- 
ing the CDCL solver may lead to better learning heuristics. This is a difficult task that 
likely requires problem specific information. Separately, failed clause caching can im- 
prove performance by remembering and avoiding candidate clauses that fail with unsat- 
isfiable reducts in multiple iterations. This would be most beneficial for problems like 
the mutilated chessboard that have many conflicting PR clauses. Lastly, incorporating 
PRELEARN during in-processing may allow for more PR clauses to be learned. This 
could be implemented with the inner/outer solver framework but would require a sig- 
nificantly narrowed search. CDCL learns many clauses during execution and it would 
be infeasible to examine binary PR clauses across the entire formula. 


Acknowledgements. We thank the community at StarExec for providing computational 
resources. 
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Abstract. Dynamic arrays, also referred to as vectors, are fundamental data 
structures used in many programs. Modeling their semantics efficiently is cru- 
cial when reasoning about such programs. The theory of arrays is widely sup- 
ported but is not ideal, because the number of elements is fixed (determined by 
its index sort) and cannot be adjusted, which is a problem, given that the length 
of vectors often plays an important role when reasoning about vector programs. 
In this paper, we propose reasoning about vectors using a theory of sequences. 
We introduce the theory, propose a basic calculus adapted from one for the the- 
ory of strings, and extend it to efficiently handle common vector operations. We 
prove that our calculus is sound and show how to construct a model when it ter- 
minates with a saturated configuration. Finally, we describe an implementation 
of the calculus in cvc5 and demonstrate its efficacy by evaluating it on verifica- 
tion conditions for smart contracts and benchmarks derived from existing array 
benchmarks. 


1 Introduction 


Generic vectors are used in many programming languages. For example, in C++’s stan- 
dard library, they are provided by std: : vector. Automated verification of software 
systems that manipulate vectors requires an efficient and automated way of reason- 
ing about them. Desirable characteristics of any approach for reasoning about vec- 
tors include: (i) expressiveness—operations that are commonly performed on vectors 
should be supported; (ii) generality—vectors are always “vectors of” some type (e.g., 
vectors of integers), and so it is desirable that vector reasoning be integrated within a 
more general framework; solvers for satisfiability modulo theories (SMT) provide such 
a framework and are widely used in verification tools (see [5] for a recent survey); (iii) 
efficiency—fast and efficient reasoning is essential for usability, especially as verifica- 
tion tools are increasingly used by non-experts and in continuous integration. 
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Despite the ubiquity of vectors in software on the one hand and the effectiveness 
of SMT solvers for software verification on the other hand, there is not currently a 
clean way to represent vectors using operators from the SMT-LIB standard [3]. While 
the theory of arrays can be used, it is not a great fit because arrays have a fixed size 
determined by their index type. Representing a dynamic array thus requires additional 
modeling work. Moreover, to reach an acceptable level of expressivity, quantifiers are 
needed, which often makes the reasoning engine less efficient and robust. Indeed, part 
of the motivation for this work was frustration with array-based modeling in the Move 
Prover, a verification framework for smart contracts [24] (see Sect.6 for more infor- 
mation about the Move Prover and its use of vectors). The current paper bridges this 
gap by studying and implementing a native theory of sequences in the SMT framework, 
which satisfies the desirable properties for vector reasoning listed above. 

We present two SMT-based calculi for determining satisfiability in the theory of 
sequences. Since the decidability of even weaker theories is unknown (see, e.g., [9, 15]), 
we do not aim for a decision procedure. Rather, we prove model and solution soundness 
(that is, when our procedure terminates, the answer is correct). Our first calculus lever- 
ages techniques for the theory of strings. We generalize these techniques, lifting rules 
specific to string characters to more general rules for arbitrary element types. By itself, 
this base calculus is already quite effective. However, it misses opportunities to per- 
form high-level vector-based reasoning. For example, both reading from and updating 
a vector are very common operations in programming, and reasoning efficiently about 
the corresponding sequence operators is thus crucial. Our second calculus addresses 
this gap by integrating reasoning methods from array solvers (which handle reads and 
updates efficiently) into the first procedure. Notice, however, that this integration is not 
trivial, as it must handle novel combinations of operators (such as the combination of 
update and read operators with concatenation) as well as out-of-bounds cases that do 
not occur with ordinary arrays. We have implemented both variants of our calculus in 
the cvcS SMT solver [2] and evaluated them on benchmarks originating from the Move 
prover, as well as benchmarks that were translated from SMT-LIB array benchmarks. 

As is typical, both of our calculi are agnostic to the sort of the elements in the 
sequence. Reasoning about sequences of elements from a particular theory can then 
be done via theory combination methods such as Nelson-Oppen [18] or polite combi- 
nation [16,20]. The former can be done for stably infinite theories (and the theory of 
sequences that we present here is stably infinite), while the latter requires investigating 
the politeness of the theory, which we expect to do in future work. 

The rest of the paper is organized as follows. Section 2 includes basic notions from 
first-order logic. Section 3 introduces the theory of sequences and shows how it can 
be used to model vectors. Section 4 presents calculi for this theory and discusses their 
correctness. Section 5 describes the implementation of these calculi in cvc5. Section 6 
presents an evaluation comparing several variations of the sequence solver in cvc5 and 
Z3. We conclude in Sect. 7 with directions for further research. 


Related Work: Our work crucially builds on a proposal by Bjørner et al. [8], but 
extends it in several key ways. First, their implementation (for a logic they call 
QF_BVRE) restricts the generality of the theory by allowing only bit-vector elements 
(representing characters) and assuming that sequences are bounded. In contrast, our 
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calculus maintains full generality, allowing unbounded sequences and elements of arbi- 
trary types. Second, while our core calculus focuses only on a subset of the operators in 
[8], our implementation supports the remaining operators by reducing them to the core 
operators, and also adds native support for the update operator, which is not included 
in [8]. 

The base calculus that we present for sequences builds on similar work for the 
theory of strings [6,17]. We extend our base calculus to support array-like reasoning 
based on the weak-equivalence approach [10]. Though there exists some prior work on 
extending the theory of arrays with more operators and reasoning about length [1, 12, 
14], this work does not include support for most of the of the sequence operators we 
consider here. 

The SMT-solver Z3 [11] also provides a solver for sequences. However, its docu- 
mentation is limited [7], it does not support update directly, and its internal algorithms 
are not described in the literature. Furthermore, as we show in Sect. 6, the performance 
of the Z3 implementation is generally inferior to our implementation in cvc5. 


2 Preliminaries 


We assume the usual notions and terminology of many-sorted first-order logic with 
equality (see, e.g., [13] for a complete presentation). We consider many-sorted sig- 
natures X, each containing a set of sort symbols (including a Boolean sort Bool), 
a family of logical symbols ~ for equality, with sort 0 x ø — Bool for all sorts 
o in X and interpreted as the identity relation, and a set of interpreted (and sorted) 
function symbols. We assume the usual definitions of well-sorted terms, literals, and 
formulas as terms of sort Bool. A literal is flat if it has the form L, p(x,...,2n), 
ap(@1,...,2n),& X Y, L X y, oraz X f(T1,..., n), where p and f are function 
symbols and zx, y, and x1,...,2,, are variables. A X-interpretation M is defined as 
usual, satisfying M(L) = false and assigns: a set M (o) to every sort o of X, a func- 
tion M(f) : M(o1) x... x M(on) — M(o) to any function symbol f of X with 
arity 7, X ... X On — G, and an element M(x) € M(o) to any variable x of sort o. 
The satisfaction relation between interpretations and formulas is defined as usual and is 
denoted by |=. 

A theory is a pair T = (2,1), in which X is a signature and I is a class of X- 
interpretations, closed under variable reassignment. The models of T are the interpreta- 
tions in I without any variable assignments. A X-formula y is satisfiable (resp., unsat- 
isfiable) in T if it is satisfied by some (resp., no) interpretation in I. Given a (set of) 
terms S, we write T (S) to denote the set of all subterms of S. For a theory T = (27,1), 
a set S' of X-formulas and a X-formula y, we write S =r y if every interpretation 
M e I that satisfies S also satisfies y. By convention and unless otherwise stated, we 
use letters w, x, y, z to denote variables and s, t, u, v to denote terms. 

The theory Tua = (Xua, In) of linear integer arithmetic is based on the signature 
Xua that includes a single sort Int, all natural numbers as constant symbols, the unary 
— symbol, the binary + symbol and the binary < relation. When k € N, we use the 
notation k - x, inductively defined by 0- x = 0 and (m+ 1) - x = x + m - x. In turn, 
In, consists of all structures M for Xia in which the domain M (Int) of Int is the set 
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Symbol Arity SMT-LIB Description 
n Int n All constants n € N 
+ Int x Int > Int + Integer addition 

Int — Int - Unary Integer minus 
< Int x Int > Bool <= Integer inequality 
€ Seq seq.empty The empty sequence 
unit Elem — Seq seq.unit Sequence constructor 
|- Seq > Int seq. len Sequence length 
nth Seq x Int > Elem seq.nth Element access 
update Seq x Int x Elem — Seq seq.update Element update 
extract Seq x Int x Int — Seq seq.extract Extraction (subsequence) 


-+ +- Seq x---xSeq—>Seq seq.concat Concatenation 


Fig. 1. Signature for the theory of sequences. 


of integer numbers, for every constant symbol n € N, M(n) = n, and +, —, and < are 
interpreted as usual. We use standard notation for integer intervals (e.g., [a, b] for the 
set of integers i, where a < i < band [a, b) for the set where a < i < b). 


3 A Theory of Sequences 


We define the theory Tseq of sequences. Its signature X'seq is given in Fig. 1. It includes 
the sorts Seq, Elem, Int, and Bool, intuitively denoting sequences, elements, integers, 
and Booleans, respectively. The first four lines include symbols of 14. We write t; <4 
to, with » € {>, <, <}, as syntactic sugar for the equivalent literal expressed using < 
(and possibly —). The sequence symbols are given on the remaining lines. Their arities 
are also given in Fig. 1. Notice that _ ++ --- + -is a variadic function symbol. 

Interpretations M of Tseq interpret: Int as the set of integers; Elem as some set; Seq 
as the set of finite sequences whose elements are from Elem; € as the empty sequence; 
unit as a function that takes an element from M(Elem) and returns the sequence that 
contains only that element; nth as a function that takes an element s from M(Seq) and 
an integer 2 and returns the ith element of s, in case 2 is non-negative and is smaller than 
the length of s (we take the first element of a sequence to have index 0). Otherwise, the 
function has no restrictions; update as a function that takes an element s from M (Seq), 
an integer i, and an element a from M (Elem) and returns the sequence obtained from 
s by replacing its ¿ith element by a, in case 2 is non-negative and smaller than the length 
of s. Otherwise, the returned value is s itself; extract as a function that takes a sequence 
s and integers 2 and j, and returns the maximal sub-sequence of s that starts at index 7 
and has length at most 7, in case both 2 and 7 are non-negative and 7 is smaller than the 
length of s. Otherwise, the returned value is the empty sequence;! |_| as a function that 
takes a sequence and returns its length; and _ + --- ++ -as a function that takes some 
number of sequences (at least 2) and returns their concatenation. 


' In [8], the second argument j denotes the end index, while here it denotes the length of the 
sub-sequence, in order to be consistent with the theory of strings in the SMT-LIB standard. 
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Notice that the interpretations of Elem and nth are not completely fixed by the 
theory: Elem can be set arbitrarily, and nth is only defined by the theory for some 
values of its second argument. For the rest, it can be set arbitrarily. 


3.1 Vectors as Sequences 


We show the applicability of Tse, by using it for a simple verification task. Consider the 
C++ function swap at the top of Fig. 2. This function swaps two elements in a vector. 
The comments above the function include a partial specification for it: if both indexes 
are in-bounds and the indexed elements are equal, then the function should not change 
the vector (this is expressed by s_out==s). We now consider how to encode the ver- 
ification condition induced by the code and the specification. The function variables a, 
b, i, and 7 can be encoded as variables of sort Int with the same names. We include two 
copies of s: s for its value at the beginning, and Sout for its value at the end. But what 
should be the sorts of s and Sout? In Fig.2 we consider two options: one is based on 
arrays and the other on sequences. 


Example I (Arrays). The theory of arrays includes three sorts: index, element (in this 
case, both are Int), and an array sort Arr, as well as two operators: x[i], interpreted as 
the ith element of x; and x/i < al, interpreted as the array obtained from x by setting 
the element at index 7 to a. We declare s and Sout as variables of an uninterpreted sort 
V and declare two functions £ and c, which, given v of sort V, return its length (of sort 
Int) and content (of sort Arr), respectively. 

Next, we introduce functions to model vector operations: ~ 4 for comparing vectors, 
ntha for reading from them, and update, for updating them. These functions need to 
be axiomatized. We include two axioms (bottom of Fig. 2): Axı states that two vectors 
are equal iff they have the same length and the same contents. Ax2 axiomatizes the 
update operator; the result has the same length, and if the updated index is in bounds, 
then the corresponding element is updated. These axioms are not meant to be complete, 
but are rather just strong enough for the example. 

The first two lines of the swap function are encoded as equalities using ntha, and 
the last two lines are combined into one nested constraint that involves update, . The 
precondition of the specification is naturally modeled using nth 4, and the post-condition 
is negated, so that the unsatisfiability of the formula entails the correctness of the func- 
tion w.r.t. the specification. Indeed, the conjunction of all formulas in this encoding is 
unsatisfiable in the combined theories of arrays, integers, and uninterpreted functions. 


The above encoding has two main shortcomings: It introduces auxiliary symbols, and 
it uses quantifiers, thus reducing clarity and efficiency. In the next example, we see how 
using the theory of sequences allows for a much more natural and succinct encoding. 


Example 2 (Sequences). In the sequences encoding, s and Sout have sort Seq. No aux- 


iliary sorts or functions are needed, as the theory symbols can be used directly. Further, 


? Tt is possible to obtain a similar encoding using the theory of datatypes; however, here we use 
uninterpreted functions which are simpler and better supported by SMT solvers. 
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// @pre: 0 <= i,j < s.size() and s[i] == s[j] 
// @post: s out == s$ 


void swap(std:: vector<int>& s, int i, int j) { 
int a = s[i]; 
int b = s[j]; 
s[i] = b; 
s[j] = a; 
J 
Sequences Arrays 
Problem Variables a,b,i, j : Int S, Sout : Seq a,b, i,j : Int S, Sout : V 
Auxiliary Variables L: V — Int c: V —> Arr 
za: V x V —> Bool 
ntha : V x Int > Int 
update, : V x Int x Int > V 
Axioms Azı A Ax2 
Program a ~ nth(s,i) A b ~ nth(s, 7) a ~ ntha(s,7) A b ~ ntha(s, 7) 
Sout ©% update(update(s, i, b), j, a) Sout ~A update, (update, (s, 2, b), 7, a) 
Spec. 0 <i,7 < |s| A nth(s,2) ~ nth(s,j)  0<%,7 < &(s) A ntha(s,2) ~ ntha(s, j) 
Sout © S Sout XA S 


Axı := Yx, y.x xa y > (L(x) ~ L(y) a YO <i < &(x).c(x)[t] ~ ely)[i]) 
Azə := Yx, y, i, a.y ~a update, (x,i,a) > (L(x) ~ L(y) a (0 <i < L(x) > c(y) ~ c(x)[i — a])) 


Fig. 2. An example using TSeq. 


these symbols do not need to be axiomatized as their semantics is fixed by the the- 
ory. The resulting formula, much shorter than in Exmaple 2 and with no quantifiers, is 
unsatisfiable in Tseq. 


4 Calculi 


After introducing some definitions and assumptions, we describe a basic calculus for 
the theory of sequences, which adapts techniques from previous procedures for the 
theory of strings. In particular, the basic calculus reduces the operators nth and update 
by introducing concatenation terms. We then show how to extend the basic calculus by 
introducing additional rules inspired by solvers for the theory of arrays; the modified 
calculus can often reason about nth and update terms directly, avoiding the introduction 
of concatenation terms (which are typically expensive to reason about). 

Given a vector of sequence terms t = (t1,---,tn), we use t to denote the term 
corresponding to the concatenation of t1,...,tn. If n = 0, t denotes e, and ifn = 1, t 
denotes tı; otherwise (when n > 1), t denotes a concatenation term having n children. 
In our calculi, we distinguish between sequence and arithmetic constraints. 


Definition 1. A 2'seqg-formula p is a sequence constraint if it has the form s ~ t or 
s % t; itis an arithmetic constraint if it has the forms = t, s > t, s # t, or s < t where 
8, t are terms of sort lnt, or if it is a disjunction cı V c3 of two arithmetic constraints. 


Notice that sequence constraints do not have to contain sequence terms (e.g., £ ~ Y 
where x, y are Elem-variables). Also, equalities and disequalities between terms of sort 
Int are both sequence and arithmetic constraints. In this paper we focus on sequence 
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le] > 0 junit(t)| > 1 
|jupdate(s, z,t)| — |s| [s1 Hh +++ Ht sa| > [si] +--+ + [sn 
wure+tv->utv u H (si He HH sn) HV > UH sr HH sn HY 


Fig. 3. Rewrite rules for the reduced form t] of a term t, obtained from t by applying these rules 
to completion. 


constraints and arithmetic constraints. This is justified by the following lemma. (Proofs 
of this lemma and later results can be found in an extended version of this paper [23].) 


Lemma 1. For every quantifier-free Xiseq-formula ‘p, there are sets S1,...,Sn of 
sequence constraints and sets A,,...,An of arithmetic constraints such that ọ is Tseq- 
satisfiable iff S; O A; is T5eq-satisfiable for some i € (1, n]. 


Throughout the presentation of the calculi, we will make a few simplifying assumptions. 
Assumption 1. Whenever we refer to a set S of sequence constraints, we assume: 


1. for every non-variable term t € T (S), there exists a variable x such that x ~ t € S; 
2. for every Seq-variable x, there exists a variable l, such that ly ~ |x| € S; 
3. all literals in S are flat. 


Whenever we refer to a set of arithmetic constraints, we assume all its literals are flat. 


These assumptions are without loss of generality as any set can easily be transformed 
into an equisatisfiable set satisfying the assumptions by the addition of fresh variables 
and equalities. Note that some rules below introduce non-flat literals. In such cases, 
we assume that similar transformations are done immediately after applying the rule to 
maintain the invariant that all literals in S U A are flat. Rules may also introduce fresh 
variables k of sort Seq. We further assume that in such cases, a corresponding constraint 
Lk ~ |k| is added to S with a fresh variable ¢;,. 


Definition 2. Let C be a set of constraints. We write C = ọ to denote that C entails 
formula % in the empty theory, and write =ç to denote the binary relation over T (C) 
such that s =c tiff CE s x t. 


Lemma 2. For all set S of sequence constraints, =s is an equivalence relation; fur- 
thermore, every equivalence class of =s contains at least one variable. 


We denote the equivalence class of a term s according to =s by [s]=, and drop the =s 
subscript when it is clear from the context. 

In the presentation of the calculus, it will often be useful to normalize terms to what 
will be called a reduced form. 


Definition 3. Let t be a X'seq-term. The reduced form of t, denoted by t|, is the term 
obtained by applying the rewrite rules listed in Fig. 3 to completion. 


Observe that t| is well defined because the given rewrite rules form a terminating 
rewrite system. This can be seen by noting that each rule reduces the number of appli- 
cations of sequence operators in the left-hand side term or keeps that number the same 
but reduces the size of the term. It is not difficult to show that F7,,, t ~ t}. 

We now introduce some basic definitions related to concatenation terms. 
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Definition 4. A concatenation term is a term of the form sı + +++ + Sn withn > 2. 
If each s; is a variable, it is a variable concatenation term. For a set S of sequence 
constraints, a variable concatenation term x1 +- - -++ £n is singular in S ifS 4 x; ~ € 
for at most one variable x; with i € [1,n]. A sequence variable x is atomic in S if 
SF x x eand for all variable concatenation terms s € T(S) such that S = <£ ~ s, s 
is singular in S. 


We lift the concept of atomic variables to atomic representatives of equivalence classes. 


Definition 5. Let S be a set of sequence constraints. Assume a choice function a : 
T(S)/=s — T(S) that chooses a variable from each equivalence class of =s. A 
sequence variable x is an atomic representative in S if it is atomic in S and £ = 


a([e]=s). 


Finally, we introduce a relation that is the foundation for reasoning about concatena- 
tions. 


Definition 6. Let S be a set of sequence constraints. We inductively define a relation 
S Fi, x ~ s, where x is a sequence variable in S and s is a sequence term whose 
variables are in T (S), as follows: 


1 S =p x for all sequence variables x € T(S). 

2. S Hy t for all sequence variables x € T(S) and variable concatenation terms 
t, where x x t ES. 

3. fS |p x ~ (W+ y+ Z)landS } y ~ tandt is € or a variable concatenation 
term in S that is not singular in S, then S =} z ~ (w + t + Z). 


LEX 
LTA 


Let a be a choice function for S as defined in Definition 5. We additionally define the 
entailment relation S |=*,, x ~ J, where Ņ is of length n > 0, to hold if each element of 


y is an atomic representative in S and there exists Z of length n such that S = 4, £x ~ Z 
and S |= yi ~ zi forie (1, nl. 


In other words, S 4, x ~ t holds when t is a concatenation of atomic representa- 
tives and is entailed to be equal to x by S. In practice, t is determined by recursively 
expanding concatenations using equalities in S until a fixpoint is reached. 


Example 3. Suppose S = {x ~ y + z,y ~ w+ u,u ~ v} (we omit the additional 
constraints required by Assumption 1, part 2 for brevity). It is easy to see that u, v, w, 
and z are atomic in S, but x and y are not. Furthermore, w and z (and one of u or v) 
must also be atomic representatives. Clearly, S F4, 2 ~ x andS E z ~ y A 
Moreover, y + z is a variable concatenation term that is not singular in S. Hence, we 
have S Fy x ~ (y + z)], and so S Fy x ~ y 4 z (by using either Item 2 or 
Item 3 of Defintion 6, as in fact x ~ y + z € S. ). Now, since S Fy £ ~ y Z, 
S H y ~ w+ u, and w + u is a variable concatenation term not singular in S, we get 
that S | x ~ ((w + u) H z)l, and so S Fi, z ~ w+ u+ z. Now, assume that 
v = a((vj=,) = a({v, u}). Then, S =, £ ~ w + v+ z. 


Our calculi can be understood as modeling abstractly a cooperation between an arith- 
metic subsolver and a sequence subsolver. Many of the derivation rules lift those in the 
string calculus of Liang et al. [17] to sequences of elements of an arbitrary type. We 
describe them similarly as rules that modify configurations. 
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Definition 7. A configuration is either the distinguished configuration unsat or a pair 
(S, A) of a set S of sequence constraints and a set A of arithmetic constraints. 


The rules are given in guarded assignment form, where the rule premises describe the 
conditions on the current configuration under which the rule can be applied, and the 
conclusion is either unsat, or otherwise describes the resulting modifications to the 
configuration. A rule may have multiple conclusions separated by ||. In the rules, some 
of the premises have the form S = s ~ t (see Definition 2). Such entailments can be 
checked with standard algorithms for congruence closure. Similarly, premises of the 
form S uia s ~ t can be checked by solvers for linear integer arithmetic. 

An application of a rule is redundant if it has a conclusion where each component 
in the derived configuration is a subset of the corresponding component in the premise 
configuration. We assume that for rules that introduce fresh variables, the introduced 
variables are identical whenever the premises triggering the rule are the same (i.e., we 
cannot generate an infinite sequence of rule applications by continuously using the same 
premises to introduce fresh variables).* A configuration other than unsat is saturated 
with respect to a set R of derivation rules if every possible application of a rule in R 
to it is redundant. A derivation tree is a tree where each node is a configuration whose 
children, if any, are obtained by a non-redundant application of a rule of the calculus. 
A derivation tree is closed if all of its leaves are unsat. As we show later, a closed 
derivation tree with root node (S, A) is a proof that A ù S is unsatisfiable in Tseq. In 
contrast, a derivation tree with root node (S, A) and a saturated leaf with respect to all 
the rules of the calculus is a witness that A U S is satisfiable in Tseq- 


4.1 Basic Calculus 
Definition 8. The calculus BASE consists of the derivation rules in Figs. 4 and 5. 


Some of the rules are adapted from previous work on string solvers [17,22]. Compared 
to that work, our presentation of the rules is noticeably simpler, due to our use of the 
relation |’, from Definition 6. In particular, our configurations consist only of pairs of 
sets of formulas, without any auxiliary data-structures. 

Note that judgments of the form S |=, x ~ t are used in premises of the calculus. 
It is possible to compute whether such a premise holds thanks to the following lemma. 


Lemma 3. Let S be a set of sequence constraints and A a set of arithmetic constraints. 
If (S,A) is saturated w.rt. S-Prop, L-Intro and L-Valid, the problem of determining 
whether S |=}, x ~ s for given x and s is decidable. 


Lemma 3 assumes saturation with respect to certain rules. Accordingly, our proof strat- 
egy, described in Sect.5, will ensure such saturation before attempting to apply rules 
relying on |=%,. The relation |=¥, induces a normal form for each equivalence class of 
=ç. 


3 In practice, this is implemented by associating each introduced variable with a witness term as 
described in [21]. 
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A Kua L AkKtnasxt st S 
A-Cont LIA A-Prop LIA $ s,te T(S) 
unsat S:=S,sxt 
SH SHsat s,t S s,t are Xuia-terms 
S-Conf S-Prop z ste T(S) s, t are XLia-terms 
unsat A:=A,sxt 
S-A x,y Ee T(S)a T(A) x,y: Int 
A:=A,rxy || A:=A,zžy 
: 5 siS S S 
L-Intro sET(S) _s:Seq L-Valid LE T( ) x eq 
S := S, |s| ~ (|s|)] S:=S,zxe || A:=A,4,>0 
U-Eq S H unit(x) ~ unit(y) C-Eq SEL, raz SEL yaz 
S:=S,ary S:=S,r~y 
C-Split Hie e ~ (W + y + 2) SEZA w+ y Hz’) 
A := A, by > ly S:=S,y x y k | 
A := A, by < ly S =S,y xy k I 
A := A, by ~ by S:=S,y x y' 
S iS 
Deq-Ext THUE Zy: Ped 


A:=A, bl || 
A := A, la X by, 0 Si <lr S:= S, w & nth(z, i), w2 ~ nth(y, i), w1 X we 


Fig. 4. Core derivation rules. The rules use k and 7 to denote fresh variables of sequence and 
integer sort, respectively, and wı and we for fresh element variables. 


Lemma 4. Let S be a set of sequence constraints and A a set of arithmetic constraints. 
Suppose (S,A) is saturated w.rt. A-Conf, S-Prop, L-Intro, L-Valid, and C-Split. Then, 
for every equivalence class e of =s whose terms are of sort Seq, there exists a unique 
(possibly empty) 5 such that whenever S |=}, x ~ 8’ for x € e, then s’ = 8. In this 
case, we call 8 the normal form of e (and of x). 


We now turn to the description of the rules in Fig.4, which form the core of the 
calculus. For greater clarity, some of the conclusions of the rules include terms before 
they are flattened. First, either subsolver can report that the current set of constraints is 
unsatisfiable by using the rules A-Conf or S-Conf. For the former, the entailment E11, 
(which abbreviates =r) can be checked by a standard procedure for linear integer 
arithmetic, and the latter corresponds to a situation where congruence closure detects 
a conflict between an equality and a disequality. The rules A-Prop, S-Prop, and S-A 
correspond to a form of Nelson-Oppen-style theory combination between the two sub- 
solvers. The first two communicate equalities between the sub-solvers, while the third 
guesses arrangements for shared variables of sort Int. L-Intro ensures that the length 
term |s| for each sequence term s is equal to its reduced form (|s|)|. L-Valid restricts 
sequence lengths to be non-negative, splitting on whether each sequence is empty or 
has a length greater than 0. The unit operator is injective, which is captured by U-Eq. 
C-Eq concludes that two sequence terms are equal if they have the same normal form. If 
two sequence variables have different normal forms, then C-Split takes the first differing 
components y and y’ from the two normal forms and splits on their length relationship. 
Note that C-Split is the source for non-termination of the calculus (see, e.g., [17,22]). 
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x x extract(y, i, j) E S 
A:=A,i<O0vitlyvj <0 S:=S,axe || 
A:=A,0 <1 < ly, j > 0, le ~ i, lz ~ min(j, ly — i) 

S:=SyrkHaotHk’ 


R-Extract 


a = nth(y,i) eS 
A:=A,i<Ovizly |l 
A:=A,0 <i< by, & xi S := S,y ~ k + unit(z) +k 


R-Nth 


1 


x ~ update(y,i,z)ES 


R-Update 
j A:=A,i<0vi> 4y S:=S,zxy | 
A:=A,0 <i < by lk X i, bp x1 
S:=S,y x k+ k + k", a k+ unit(z) + k” 


Fig. 5. Reduction rules for extract, nth, and update. The rules use k, k’, and k” to denote fresh 
sequence variables. We write s ~ min(t, u) as an abbreviation for s ~ t V s ~ u,s < t,s < u. 


Finally, Deq-Ext handles disequalities between sequences x and y by either asserting 
that their lengths are different or by choosing an index 7 at which they differ. 

Figure 5 includes a set of reduction rules for handling operators that are not directly 
handled by the core rules. These reduction rules capture the semantics of these operators 
by reduction to concatenation. R-Extract splits into two cases: Either the extraction uses 
an out-of-bounds index or a non-positive length, in which case the result is the empty 
sequence, or the original sequence can be described as a concatenation that includes 
the extracted sub-sequence. R-Nth creates an equation between y and a concatenation 
term with unit(x) as one of its components, as long as 7 is not out of bounds. R-Update 
considers two cases. If 7 is out of bounds, then the update term is equal to y. Otherwise, 
y is equal to a concatenation, with the middle component (k’) representing the part of y 
that is updated. In the update term, k’ is replaced by unit(z). 


Example 4. Consider a configuration (S, A), where S contains the formulas z ~ y+ z, 
z x v+ z+ uw, and v x unit(u), and A is empty. Hence, S = |z| ~ |y + z|. By 
L-Intro, we have S — |y + z| ~ |y| + |z|. Together with Assumption 1, we have 
S H lz ~ ly + €,, and then with S-Prop, we have ly ~ ly + €, € A. Similarly, we 
can derive l; ~ ly + lr + lw, lo ~ 1 € S, and so (*)A Huia £2 ~ 14+ by +l + bw. 
Notice that for any variable k of sort Seq, we can apply L-Valid, L-Intro, and S-Prop to 
add to A either £% > 0 or £k = 0. Applying this to y, z, w, we have that A Hua L in 
each branch thanks to (*), and so A-Conf applies and we get unsat. 


4.2 Extended Calculus 


Definition 9. The calculus EXT is comprised of the derivation rules in Figs. 4 and 6, 
with the addition of rule R-Extract from Fig. 5. 


Our extended calculus combines array reasoning, based on [10] and expressed by 
the rules in Fig. 6, with the core rules of Fig. 4 and the R-Extract rule. Unlike in BASE, 
those rules do not reduce nth and update. Instead, they reason about those operators 
directly and handle their combination with concatenation. Nth-Concat identifies the ¿th 


136 Y. Sheng et al. 


element of sequence y with the corresponding element selected from its normal form 
(see Lemma 4). Update-Concat operates similarly, applying update to all the compo- 
nents. Update-Concat-Inv operates similarly on the updated sequence rather than on the 
original sequence. Nth-Unit captures the semantics of nth when applied to a unit term. 
Update-Unit is similar and distinguishes an update on an out-of-bounds index (different 
from 0) from an update within the bound. Nth-Intro is meant to ensure that Nth-Update 
(explained below) and Nth-Unit (explained above) are applicable whenever an update 
term exists in the constraints. Nth-Update captures the read-over-write axioms of arrays, 
adapted to consider their lengths (see, e.g., [10]). It distinguishes three cases: In the first, 
the update index is out of bounds. In the second, it is not out of bounds, and the cor- 
responding nth term accesses the same index that was updated. In the third case, the 
index used in the nth term is different from the updated index. Update-Bound considers 
two cases: either the update changes the sequence, or the sequence remains the same. 
Finally, Nth-Split introduces a case split on the equality between two sequence variables 
x and x’ whenever they appear as arguments to nth with equivalent second arguments. 
This is needed to ensure that we detect all cases where the arguments of two nth terms 
must be equal. 


4.3 Correctness 
In this section we prove the following theorem: 


Theorem 1. Let X € {BASE, EXT} and (So, Ao) be a configuration, and assume with- 
out loss of generality that Ag contains only arithmetic constraints that are not sequence 
constraints. Let T be a derivation tree obtained by applying the rules of X with (So, Ao) 
as the initial configuration. 


1. If T is closed, then So U Ao is T5eq-unsatisfiable. 
2. IfT contains a saturated configuration (S, A) w.rt. X, then (S, A) is Tseq-satisfiable, 
and so is (So, Ao). 


The theorem states that the calculi are correct in the following sense: if a closed deriva- 
tion tree is obtained for the constraints Sp U Ao then those constraints are unsatisfiable 
in TSeq; if a tree with a saturated leaf is obtained, then they are satisfiable. It is possible, 
however, that neither kind of tree can be derived by the calculi, making them neither 
refutation-complete nor terminating. This is not surprising since, as mentioned in the 
introduction, the decidability of even weaker theories is still unknown. 

Proving the first claim in Theorem | reduces to a local soundness argument for each 
of the rules. For the second claim, we sketch below how to construct a satisfying model 
M from a saturated configuration for the case of EXT. The case for BASE is similar 
and simpler. 


Model Construction Steps. The full model construction and its correctness are 
described in a longer version of this paper [23] together with a proof of the theorem 
above. Here is a summary of the steps needed for the model construction. 


1. Sorts: M (Elem) is interpreted as some arbitrary countably infinite set. M (Seq) and 
M (Int) are then determined by the theory. 
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x ~ nth(y,i)ES SKE y = wi H H wn 
A=Ajix<0vish, | 
A:=A,0 <i < lu, S:=S,aenth(wi,i) | || 
-1 


A:= Ase lw; <i< Si bo, S := S, x ~ nth(wn,i— >) lu) 
j=l j= 


Nth-Concat 


x ~ update(y,2,v) ES S HŽ, y x wi H -H un 
S := S, x ~ z1 H H Zn, 


Update-Concat 


n—1 
zı © update(wi,2,v),...,2n © update(wn, i — > Lw, v) 
j=l 


x ~update(y,i,v)eS SEY aw t+ un 
S :=S,y ~ z1 H H Zn, 


Update-Concat-Inv 


n-1 
wi ~ update(z1,2,v),...,Wn © update(zn, i — `, Lwz, V) 
j=l 
Nth-Unit ra ehh) eS SE y x unit(u) 
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SE 
A:=A,i<O0vi>0 S:=S,x ~ unit(u) || 
A:=A,ix0 S := S, x ~ unit(v) 


s' ~ update(s,i,t) € S 
S :=S,e ~ nth(s,i),e' = nth(s’, i) 


Nth-Intro 


nth(x, 7) € T(S) y ~ update(z,i,v) E€ S SEarxyorSkFarz 
A:=A,j<O0Vi> be || 
A:=A,ix%j0<j <lr S:=S,nth(y,j) xv || 
A:=A,ji#j,0<j<& S := S,nth(y,7) ~ nth(z, 7) 


Nth-Update 


x ~ update(y,i,v) ES 
=A,0<i<ty S:=S,nth(y,i)#v || S:=S,z ~y 


Update-Bound 
A 


th(z,i),nth(x', i)e T(S)  ixieA 


Nth-Split © 
ei esasa | 6 Saee 


Fig. 6. Extended derivation rules. The rules use 21, . . . , Zn to denote fresh sequence variables and 
e,e' to denote fresh element variables. 


2. Liseq-symbols: Tseq enforces the interpretation of almost all 2'seg-symbols, except 
for nth when the second input is out of bounds. We cover this case below. 
3. Integer variables: based on the saturation of A-Conf, we know there is some Tiia- 
model satisfying A. We set M to interpret integer variables according to this model. 
4. Element variables: these are partitioned into their =s equivalence classes. Each class 
is assigned a distinct element from M (Elem), which is possible since it is infinite. 
5. Atomic sequence variables: these are assigned interpretations in several sub-steps: 
(a) length: we first use the assignments to variables Zy to set the length of M(x), 
without assigning its actual value. 
(b) unit variables: for variables x with x =s unit(z), we set M(x) to be [M(z)]. 
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(c) non-unit variables: All other sequence variables are assigned values according 
to a weak equivalence graph we construct in a manner similar to [10]. This 
construction takes into account constraints that involve update and nth. 

6. Non-atomic sequence variables: these are first transformed to their unique normal 
form (see Lemma 4), consisting of concatenations of atomic variables. Then, the 
values assigned to these variables are concatenated. 

7. nth-terms: for out-of-bounds indices in nth-terms, we rely on =s to make sure that 
the assignment is consistent. 


We conclude this section with an example of the construction of M. 


Example 5. Consider a signature in which Elem is Int, and a saturated configuration 
(S*, A*) w.r.t. EXT that includes the following formulas: y ~ y1 + yo, £ ~ £1 H T2, 
Y2 © ©2, Yı & update(x1, 2, a), |yi| = |x|, |y2| = |£2], nth(y, i) ~ a, nth(y1, i) ~ a. 
Following the above construction, a satisfying interpretation M can be built as follows: 


> 


Step 1 Set both M(Int) and M (Elem) to be the set of integer numbers. (Seq) is 
fixed by the theory. 

Step 3, Step 4 First, find an arithmetic model, M(é,) = M(£,) = 4,M(4,,) = 
M (lzi) = 2, M(ly.) = M(la,) = 2, M(t) = 0. Further, set M(a) = 0. 

Step 5a Start assigning values to sequences. First, set the lengths of M(x) and M(y) 
to be 4, and the lengths of M(x1), M(r2), M(y1), M (y2) to be 2. 

Step 5b is skipped as there are no unit terms. 

Step 5c Set the Oth element of M (y1) to 0 to satisfy nth(y1,7) = a (yı is atomic, y 
is not). Assign fresh values to the remaining indices of atomic variables. The result 
can be, e.g., M(y1) = [0,2], M (z1) = [1,2], M(y2) = M(a2) = [3,4]. 

Step 6 Assign non-atomic sequence variables based on equivalent concatenations: 
M(y) = [0,2,3,4], M(x) = [1, 2,3, 4]. 

Step 7 No integer variable in the formula was assigned an out-of-bound value, and so 
the interpretation of nth on out-of-bounds cases is set arbitrarily. 


5 Implementation 


We implemented our procedure for sequences as an extension of a previous theory 
solver for strings [17,22]. This solver is integrated in cvc5, and has been generalized to 
reason about both strings and sequences. In this section, we describe how the rules of 
the calculus are implemented and the overall strategy for when they are applied. 

Like most SMT solvers, cvc5 is based on the CDCL(T) architecture [19] which 
combines several subsolvers, each specialized on a specific theory, with a solver for 
propositional satisfiability (SAT). Following that architecture, cvc5 maintains an evolv- 
ing set of formulas F. When F starts with quantifier-free formulas over the theory TSeq, 
the case targeted by this work, the SAT solver searches for a satisfying assignment for 
F, represented as the set M of literals it satisfies. If none exists, the problem is unsatisfi- 
able at the propositional level and hence Ts.q-unsatisfiable. Otherwise, M is partitioned 
into the arithmetic constraints A and the sequence constraints S and checked for Tseq- 
satisfiability using the rules of the EXT calculus. Many of those rules, including all 
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those with multiple conclusions, are implemented by adding new formulas to F (fol- 
lowing the splitting-on-demand approach [4]). This causes the SAT solver to try to 
extend its assignment to those formulas, which results in the addition of new literals to 
M (and thereby also to A and S). 

In this setting, the rules of the two calculi are implemented as follows. The effect 
of rule A-Conf is achieved by invoking cvc5’s theory solver for linear integer arithmetic. 
Rule S-Conf is implemented by the congruence closure submodule of the theory solver 
for sequences. Rules A-Prop and S-Prop are implemented by the standard mechanism 
for theory combination. Note that each of these four rules may be applied eagerly, that 
is, before constructing a complete satisfying assignment M for F. 

The remaining rules are implemented in the theory solver for sequences. Each time 
M is checked for satisfiability, cvc5 follows a strategy to determine which rule to apply 
next. If none of the rules apply and the configuration is different from unsat, then it is 
saturated, and the solver returns sat. The strategy for EXT prioritizes rules as follows. 
Only the first applicable rule is applied (and then control goes back to the SAT solver). 


1. (Add length constraints) For each sequence term in S, apply L-Intro or L-Valid, if not 
already done. We apply L-Intro for non-variables, and L-Valid for variables. 

2. (Mark congruent terms) For each set of update (resp. nth) terms that are congruent 
to one another in the current configuration, mark all but one term and ignore the 
marked terms in the subsequent steps. 

. (Reduce extract) For extract(y, i, j) in S, apply R-Extract if not already done. 

4. (Construct normal forms) Apply U-Eq or C-Split. We choose how to apply the latter 
rule based on constructing normal forms for equivalence classes in a bottom-up fash- 
ion, where the equivalence classes of x and y are considered before the equivalence 
class of x+y. We do this until we find an equivalence class such that S =}, z ~ u1 
and S =¥, z ~ u2 for distinct u1, u2. 

. (Normal forms) Apply C-Eq if two equivalence classes have the same normal form. 

. (Extensionality) For each disequality in S, apply Deq-Ext, if not already done. 

7. (Distribute update and nth) For each term update(x, i,t) (resp. nth(x, j)) such that 
the normal form of x is a concatenation term, apply Update-Concat and Update- 
Concat-Inv (resp. Nth-Concat) if not already done. Alternatively, if the normal form 
of the equivalence class of x is a unit term, apply Update-Unit (resp. Nth-Unit). 

8. (Array reasoning on atomic sequences) Apply Nth-Intro and Update-Bound to 
update terms. For each update term, find the matching nth terms and apply 
Nth-Update. Apply Nth-Split to pairs of nth terms with equivalent indices. 

9. (Theory combination) Apply S-A for all arithmetic terms occurring in both S and A. 


Ow 


nN 


Whenever a rule is applied, the strategy will restart from the beginning in the next itera- 
tion. The strategy is designed to apply with higher priority steps that are easy to compute 
and are likely to lead to conflicts. Some steps are ordered based on dependencies from 
other steps. For instance, Steps 5 and 7 use normal forms, which are computed in Step 
4. The strategy for the BASE calculus is the same, except that Steps 7 and 8 are replaced 
by one that applies R-Update and R-Nth to all update and nth terms in S. 

We point out that the C-Split rule may cause non-termination of the proof strategy 
described above in the presence of cyclic sequence constraints, for instance, constraints 
where sequence variables appear on both sides of an equality. The solver uses methods 
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for detecting some of these cycles, to restrict when C-Split is applied. In particular, 
when S =, z = (UH st w)|,S Ei, z ~ (u+ t+ v)], and s occurs in v, 
then C-Split is not applied. Instead, other heuristics are used, and in some cases the 
solver terminates with a response of “unknown” (see e.g., [17] for details). In addition 
to the version shown here, we also use another variation of the C-Split rule where the 
normal forms are matched in reverse (starting from the last terms in the concatenations). 
The implementation also uses fast entailment tests for length inequalities. These tests 
may allow us to conclude which branch of C-Split, if any, is feasible, without having to 
branch on cases explicitly. 

Although not shown here, the calculus can also accommodate certain extended 
sequence constraints, that is, constraints using a signature with additional functions. 
For example, our implementation supports sequence containment, replacement, and 
reverse. It also supports an extended variant of the update operator, in which the third 
argument is a sequence that overrides the sequence being updated starting from the 
index given in the second argument. Constraints involving these functions are handled 
by reduction rules, similar to those shown in Fig. 5. The implementation is further opti- 
mized by using context-dependent simplifications, which may eagerly infer when cer- 
tain sequence terms can be simplified to constants based on the current set of assertions 
[22]. 


6 Evaluation 


We evaluate the performance of our approach, as implemented in cvc5. The evaluation 
investigates: (i) whether the use of sequences is a viable option for reasoning about 
vectors in programs, (ii) how our approach compares with other sequence solvers, and 
(iii) what is the performance impact of our array-style extended rules. As a baseline, we 
use Version 4.8.14 of the Z3 SMT solver, which supports a theory of sequences with- 
out updates. For cvc5, we evaluate implementations of both the basic calculus (denoted 
cvc5) and the extended array-based calculus (denoted eve5-a). The benchmarks, solver 
configurations, and logs from our runs are available for download.* We ran all exper- 
iments on a cluster equipped with Intel Xeon E5-2620 v4 CPUs. We allocated one 
physical CPU core and 8 GB of RAM for each solver-benchmark pair and used a time 
limit of 300 s. We use the following two sets of benchmarks: 


Array Benchmarks (ARRAYS). The first set of benchmarks is derived from the QF_AX 
benchmarks in SMT-LIB [3]. To generate these benchmarks, we (i) replace declarations 
of arrays with declarations of sequences of uninterpreted sorts, (ii) change the sort of 
index terms to integers, and (iii) replace store with update and select with nth. The 
resulting benchmarks are quantifier-free and do not contain concatenations. Note that 
the original and the derived benchmarks are not equisatisfiable, because sequences take 
into account out-of-bounds cases that do not occur in arrays. For the Z3 runs, we add to 
the benchmarks a definition of update in terms of extraction and concatenation. 


Smart Contract Verification (DIEM). The second set of benchmarks consists of veri- 
fication conditions generated by running the Move Prover [24] on smart contracts writ- 
ten for the Diem framework. By default, the encoding does not use the sequence update 


* http://dx.doi.org/10.528 1/zenodo.6146565. 


Reasoning About Vectors Using an SMT Theory of Sequences 141 


w/ update 
Set eve5 cvc5-a z3 


ARRAYS Slvd 242 390 170 
(551) Time 162 303 4329 


DIEM Slvd 542 547 443 
(558) Time 518 440 639 


10 10°10 10? 107 10° 10! 10? 
cvc5 [s] cvc5 [s] 


(a) (b) ARRAYS (c) DIEM 


Fig. 7. Figure a lists the number of solved benchmarks and total time on commonly solved bench- 
marks. The scatter plots compare the base solver (cve5) and the extended solver (eve5-a) on 
ARRAY (Fig. b) and DIEM (Fig. c) benchmarks. 


operation, and so Z3 can be used directly. However, we also modified the Move Prover 
encoding to generate benchmarks that do use the update operator, and ran cvc5 on them. 
In addition to using the sequence theory, the benchmarks make heavy use of quantifiers 
and the SMT-LIB theory of datatypes. 

Figure 7a summarizes the results in terms of number of solved benchmarks and total 
time in seconds on commonly solved benchmarks. The configuration that solves the 
largest number of benchmarks is the implementation of the extended calculus (evc5-a). 
This approach also successfully solves most of the DIEM benchmarks, which suggests 
that sequences are a promising option for encoding vectors in programs. The results 
further show that the sequences solver of cvc5 significantly outperforms Z3 on both the 
number of solved benchmarks and the solving time on commonly solved benchmarks. 

Figures 7b and 7c show scatter plots comparing eve5 and cve5-a on the two bench- 
mark sets. We can see a clear trend towards better performance when using the extended 
solver. In particular, the table shows that in addition to solving the most benchmarks, 
cvc5-a is also fastest on the commonly solved instances from the DIEM benchmark set. 

For the ARRAYS set, we can see that some benchmarks are slower with the extended 
solver. This is also reflected in the table, where cvc5-a is slower on the commonly 
solved instances. This is not too surprising, as the extra machinery of the extended 
solver can sometimes slow down easy problems. As problems get harder, however, the 
benefit of the extended solver becomes clear. For example, if we drop Z3 and consider 
just the commonly solved instances between eve5 and cve5-a (of which there are 242), 
cve5-a is about 2.47 x faster (426 vs 1053s). Of course, further improving the perfor- 
mance of cve5-a is something we plan to explore in future work. 


7 Conclusion 


We introduced calculi for checking satisfiability in the theory of sequences, which can 
be used to model the vector data type. We described our implementation in cvc5 and 
provided an evaluation, showing that the proposed theory is rich enough to naturally 
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express verification conditions without introducing quantifiers, and that our implemen- 
tation is efficient. We believe that verification tools can benefit by changing their encod- 
ing of verification conditions that involve vectors to use the proposed theory and imple- 
mentation. 

We plan to propose the incorporation of this theory in the SMT-LIB standard and 
contribute our benchmarks to SMT-LIB. As future research, we plan to integrate other 
approaches for array solving into our basic solver. We also plan to study the politeness 
[16,20] and decidability of various fragments of the theory of sequences. 
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Abstract. The importance of subsumption testing for redundancy elim- 
ination in first-order logic automatic reasoning is well-known. Although 
the problem is already NP-complete for first-order clauses, the mean- 
while developed test pipelines efficiently decide subsumption in almost 
all practical cases. We consider subsumption between first-oder clauses of 
the Bernays-Schonfinkel fragment over linear real arithmetic constraints: 
BS(LRA). The bottleneck in this setup is deciding implication between 
the LRA constraints of two clauses. Our new sample point heuristic pre- 
empts expensive implication decisions in about 94% of all cases in bench- 
marks. Combined with filtering techniques for the first-order BS part 
of clauses, it results again in an efficient subsumption test pipeline for 
BS(LRA) clauses. 
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1 Introduction 


The elimination of redundant clauses is crucial for the efficient automatic rea- 
soning in first-order logic. In a resolution [5,50] or superposition setting [4,44], a 
newly inferred clause might be subsumed by a clause that is already known (for- 
ward subsumption) or it might subsume a known clause (backward subsumption). 
Although the SCL calculi family [1,11,21] does not require forward subsump- 
tion tests, a property also inherent to the propositional CDCL (Conflict Driven 
Clause Learning) approach [8,34,41,55,63], backward subsumption and hence 
subsumption remains an important test in order to remove redundant clauses. 
In this work we present advances in deciding subsumption for constrained 
clauses, specifically employing the Bernays-Schonfinkel fragment as foreground 
logic, and linear real arithmetic as background theory, BS(LRA). BS(LRA) is of 
particular interest because it can be used to model supervisors, i.e., components 
in technical systems that control system functionality. An example for a super- 
visor is the electronic control unit of a combustion engine. The logics we use 
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to model supervisors and their properties are called SupERLogs—(Sup)ervisor 
(E)ffective(R,)easoning (Log)ics. SupERLogs are instances of function-free first- 
order logic extended with arithmetic [18], which means BS(LRA) is an example 
of a SupERLog. 

Subsumption is an important redundancy criterion in the context of hier- 
archic clausal reasoning [6,11,20,35,37]. At the heart of this paper is a new 
technique to speed up the treatment of linear arithmetic constraints as part of 
deciding subsumption. For every clause, we store a solution of its associated con- 
straints, which is used to quickly falsify implication decisions, acting as a filter, 
called the sample point heuristic. In our experiments with various benchmarks, 
the technique is very effective: It successfully preempts expensive implication 
decisions in about 94% of cases. We elaborate on these findings in Sect. 4. 

For example, consider three BS clauses, none of which subsumes another: 


Cy = P(a, x) C2 := =P(y, z) V Q(y, Z, b) C3 = =R(b) V Q(a, z, b) 


Let C4 be the resolvent of C, and C2 upon the atom P(a, x), i.e., C4 := Q(a, z, b). 
Now C4 backward-subsumes C3 with matcher o := {z > x}, i.e. Cyo C C3, thus 
C3 is redundant and can be eliminated. Now, consider an extension of the above 
clauses with some simple LRA constraints following the same reasoning: 


Ci := x > 1 || P(a, x) 
Cy = z > 0 || ~P(y, 2) V Qly, z, b) 
C3 := x > 0 || ~R(b) V Q(a, x, b) 


where || is interpreted as an implication, i.e., clause C{ stands for ~x > 1V P(a, x) 
or simply x < 1 V P(a,x). The respective resolvent on the constrained clauses 
is C} = z > 0,z > 1 || Q(a,z,b) or after constraint simplification Cy := z > 
1 || Q(a,z,b) because z > 1 implies z > 0. For the constrained clauses, C4 does 
no longer subsume C} with matcher o = {z+ r}, because z > 0 does not 
LRA-imply z > 1. Now, if we store the sample point z = 0 as a solution for 
the constraint of clause C4, this sample point already reveals that z > 0 does 
not LRA-imply z > 1. This constitutes the basic idea behind our sample point 
heuristic. In general, constraints are not just simple bounds as in the above 
example, and sample points are solutions to the system of linear inequalities of 
the LRA constraint of a clause. 

Please note that our test on LRA constraints is based on LRA theory impli- 
cation and not on a syntactic notion such as subsumption on the first-order part 
of the clause. In this sense it is “stronger” than its first-order counterpart. This 
fact is stressed by the following example, taken from [26, Ex. 2], which shows 
that first-order implication does not imply subsumption. Let 


Cı = —P(z,y) V =P(y, z) V P(x, 2) 
Cz := ~P (a, b) V ~P (b, c) V =P(c, d) V P(a,d) 


Then we have Cı — C2, but again, for all o we have Cio Z C2: Constructing 
o from left to right we obtain o := {x > a,y => b,z |> c}, but P(a,c) ¢ Co. 
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Constructing o from right to left we obtain o := {z > d,x œ a,y © c}, but 
aP(a,c) ¢ Co. 


Related Work. Treatment of questions regarding the complexity of deciding sub- 
sumption of first-order clauses [27] dates back more than thirty years. Notions 
of subsumption, varying in generality, are studied in different sub-fields of the- 
orem proving, whereas we restrict our attention to first-order theorem proving. 
Modern implementations typically decide multiple thousand instances of this 
problem per second: In [62, Sect. 2], Voronkov states that initial versions of 
Vampire “seemed to |[...] deadlock” without efficient implementations to decide 
(forward) subsumption. 

In order to reduce the number of clauses out of a set of clauses to be con- 
sidered for pairwise subsumption checking, the best known practice in first- 
order theorem proving is to use (imperfect) indexing data structures as a means 
for pre-filtering and research concerning appropriate techniques is plentiful, see 
(24, 25, 27-30, 33,39, 40, 43, 45-49, 52-54, 56,59,61] for an evaluation of these tech- 
niques. Here we concentrate on the efficiency of a subsumption check between 
two clauses and therefore do not take indexing techniques into account. Fur- 
thermore, the implication test between two linear arithmetic constraints is of 
a semantic nature and is not related to any syntactic features of the involved 
constraints and can therefore hardly be filtered by a syntactic indexing approach. 

In addition to pre-filtering via indexing, almost all above mentioned imple- 
mentations of first-order subsumption tests rely on additional filters on the clause 
level. The idea is to generate an abstraction of clauses together with an ordering 
relation such that the ordering relation is necessary to hold between two clauses 
in order for one clause to subsume the other. Furthermore, the abstraction as 
well as the ordering relation should be efficiently computable. For example, a 
necessary condition for a first-order clause C; to subsume a first-order clause 
Co is | vars(C)| > | vars(C2)|, i.e., the number of different variables in C1 must 
be larger or equal than the number of variables in C2. Further and additional 
abstractions included by various implementations rely on the size of clauses, 
number of ground literals, depth of literals and terms, occurring predicate and 
function symbols. For the BS(LRA) clauses considered here, the structure of the 
first-order BS part, which consists of predicates and flat terms (variables and 
constants) only, is not particularly rich. 

The exploration of sample points has already been studied in the context of 
first-order clauses with arithmetic constraints. In [17,36] it was used to improve 
the performance of iSAT [23] on testing non-linear arithmetic constraints. In 
general, iSAT tests satisfiability by interval propagation for variables. If intervals 
get “too small” it typically gives up, however sometimes the explicit generation 
of a sample point for a small interval can still lead to a certificate for satisfiability. 
This technique was successfully applied in [17], but was not used for deciding 
subsumption of constrained clauses. 
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Motivation. The main motivation for this work is the realization that comput- 
ing implication decisions required to treat constraints of the background theory 
presents the bottleneck of an BS(LRA) subsumption check in practice. Inspired 
by the success of filtering techniques in first-order logic, we devise an exception- 
ally effective filter for constraints and adopt well-known first-order filters to the 
BS fragment. Our sample point heuristic for LRA could easily be generalized to 
other arithmetic theories as well as full first-order logic. 


Structure. The paper is structured as follows. After a section defining BS(LRA) 
and common notions and notation, Sect. 2, we define redundancy notions and our 
sample point heuristic in Sect. 3. Section 4 justifies the success of the sample point 
heuristic by numerous experiments in various application domains of BS(LRA). 
The paper ends with a discussion of the obtained results, Sect. 5. Binaries, utility 
scripts, benchmarking instances used as input, and the output used for evaluation 
may be obtained online [13]. 


2 Preliminaries 


We briefly recall the basic logical formalisms and notations we build upon [10]. 
Our starting point is a standard many-sorted first-order language for BS with 
constants (denoted a,b,c), without non-constant function symbols, with vari- 
ables (denoted w, x,y,z), and predicates (denoted P,Q, R) of some fixed arity. 
Terms (denoted t,s) are variables or constants. An atom (denoted A, B) is an 
expression P(t,,...,tn) for a predicate P of arity n. A positive literal is an 
atom A and a negative literal is a negated atom ~A. We define comp(A) = =A, 
comp(7A) = A, |A| = A and |=A| = A. Literals are usually denoted L, K, H. 
Formulas are defined in the usual way using quantifiers Y, 3 and the boolean 
connectives ~, V, A, >, and =. 

A clause (denoted C, D) is a universally closed disjunction of literals A, V- + -V 
A,V7B,V---V7B,,,. Clauses are identified with their respective multisets and all 
standard multiset operations are extended to clauses. For instance, C C D means 
that all literals in C also appear in D respecting their number of occurrences. A 
clause is Horn if it contains at most one positive literal, i.e. n < 1, and a unit 
clause if it has exactly one literal, i.e. n+ m = 1. We write C* for the set of 
positive literals, or conclusions of C, i.e. Ct := {Ai,...,An} and respectively 
C~ for the set of negative literals, or premises of C, i.e. CT = {>B,,...,7Bm}. 
If Y is a term, formula, or a set thereof, vars(Y) denotes the set of all variables 
in Y, and Y is ground if vars(Y) = 0. 

The Bernays-Schénfinkel Clause Fragment (BS) in first-order logic consists 
of first-order clauses where all involved terms are either variables or constants. 
The Horn Bernays-Schénfinkel Clause Fragment (HBS) consists of all sets of BS 
Horn clauses. 

A substitution o is a function from variables to terms with a finite domain 
dom(o) = {x | xo # x} and codomain codom(c) = {xo | x € dom(a)}. We 
denote substitutions by ø, ô, p. The application of substitutions is often written 
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postfix, as in zo, and is homomorphically extended to terms, atoms, literals, 
clauses, and quantifier-free formulas. A substitution ø is ground if codom(c) is 
ground. Let Y denote some term, literal, clause, or clause set. A substitution o 
is a grounding for Y if Yo is ground, and Yo is a ground instance of Y in this 
case. We denote by gnd(Y) the set of all ground instances of Y, and by gnd,p(Y) 
the set of all ground instances over a given set of constants B. The most general 
unifier mgu(Z1, Z2) of two terms/atoms/literals Zı and Zə is defined as usual, 
and we assume that it does not introduce fresh variables and is idempotent. 

We assume a standard many-sorted first-order logic model theory, and write 
AF ¢ if an interpretation A satisfies a first-order formula ¢. A formula w is a 
logical consequence of ¢, written dF y, if AF w for all A such that AF @. Sets 
of clauses are semantically treated as conjunctions of clauses with all variables 
quantified universally. 


2.1 Bernays-Sch6nfinkel with Linear Real Arithmetic 


The extension of BS with linear real arithmetic, BS(LRA), is the basis for the 
formalisms studied in this paper. We consider a standard many-sorted first- 
order logic with one first-order sort F and with the sort R for the real numbers. 
Given a clause set N, the interpretations A of our sorts are fixed: RA = R and 
FA = F. This means that F^ is a Herbrand interpretation, i.e., F is the set of 
first-order constants in N, or a single constant out of the signature if no such 
constant occurs. Note that this is not a deviation from standard semantics in 
our context as for the arithmetic part the canonical domain is considered and 
the first-order sort has the finite model property over the occurring constants 
(note that equality is not part of BS). 

Constant symbols, arithmetic function symbols, variables, and predicates are 
uniquely declared together with their respective sort. The unique sort of a con- 
stant symbol, variable, predicate, or term is denoted by the function sort(Y) 
and we assume all terms, atoms, and formulas to be well-sorted. We assume 
pure input clause sets, which means the only constants of sort R are (rational) 
numbers. This means the only constants that we do allow are rational num- 
bers c E€ Q and the constants defining our finite first-order sort F. Irrational 
numbers are not allowed by the standard definition of the theory. The current 
implementation comes with the caveat that only integer constants can be parsed. 
Satisfiability of pure BS(LRA) clause sets is semi-decidable, e.g., using hierar- 
chic superposition [6] or SCL(T) [11]. Impure BS(LRA) is no longer compact 
and satisfiability becomes undecidable, but its restriction to ground clause sets 
is decidable [22]. 

All arithmetic predicates and functions are interpreted in the usual way. 
An interpretation of BS(LRA) coincides with A‘®“ on arithmetic predicates 
and functions, and freely interprets free predicates. For pure clause sets this is 
well-defined [6]. Logical satisfaction and entailment is defined as usual, and uses 
similar notation as for BS. 


Example 1. The clause y < 5 V a #Ax+1 V 7ASo(a,y) V S1(2’,0) is part of 
a timed automaton with two clocks x and y modeled in BS(LRA). It represents 
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a transition from state So to state Sı that can be traversed only if clock y is at 
least 5 and that resets y to 0 and increases x by 1. 


Arithmetic terms are constructed from a set ¥ of variables, the set of integer 
constants c € Z, and binary function symbols + and — (written infix). Addi- 
tionally, we allow multiplication - if one of the factors is an integer constant. 
Multiplication only serves us as syntactic sugar to abbreviate other arithmetic 
terms, e.g., x +x + {x is abbreviated to 3- x. Atoms in BS(LRA) are either 
first-order atoms (e.g., P(13,x)) or (linear) arithmetic atoms (e.g., x < 42). 
Arithmetic atoms are denoted by A and may use the predicates <, <, Æ, =, >, >, 
which are written infix and have the expected fixed interpretation. We use < as a 
placeholder for any of these predicates. Predicates used in first-order atoms are 
called free. First-order literals and related notation is defined as before. Arith- 
metic literals coincide with arithmetic atoms, since the arithmetic predicates are 
closed under negation, e.g., a(a > 42) = x < 42. 

BS(LRA) clauses are defined as for BS but using BS(LRA) atoms. We often 
write clauses in the form A || C where C is a clause solely built of free first-order 
literals and A is a multiset of LRA atoms called the constraint of the clause. 
A clause of the form A||C is therefore also called a constrained clause. The 
semantics of A || C is as follows: 


Alc iff (AAC iff (Vave 
ACA ACA 


For example, the clause x > 1Vy # 5V=Q(x)vV R(x, y) is also written x < 1,y = 
5| Q(x) V R(x, y). The negation =(A|| C) of a constrained clause A || C where 
C = A V+ V An V 7B, V ++ V 7B, is thus equivalent to (Aye, À) A7A1 A 

-A nAÁn A By A^- -A Bm. Note that since the neutral element of conjunction is 
T, an empty constraint is thus valid, i.e. equivalent to true. 

An assignment for a constraint A is a substitution (denoted 8) that maps 
all variables in vars(A) to real numbers c € R. An assignment is a solution 
for a constraint A if all atoms A € (48) evaluate to true. A constraint A is 
satisfiable if there exists a solution for A. Otherwise it is unsatisfiable. Note that 
assignments can be extended to C by also mapping variables of the first-order 
sort accordingly. 

A clause or clause set is abstracted if its first-order literals contain only vari- 
ables or first-order constants. Every clause C is equivalent to an abstracted clause 
that is obtained by replacing each non-variable arithmetic term ¢ that occurs in 
a first-order atom by a fresh variable x while adding an arithmetic atom x Æ t 
to C. We assume abstracted clauses for theory development, but we prefer non- 
abstracted clauses in examples for readability, e.g., a unit clause P(3,5) is consid- 
ered in the development of the theory as the clause x = 3, y = 5 || P(x, y). In the 
implementation, we mostly prefer abstracted clauses except that we allow inte- 
ger constants c € Z to appear as arguments of first-order literals. In some cases, 
this makes it easier to recognize whether two clauses can be matched or not. For 
instance, we see by syntactic comparison that the two unit clauses P(3,5) and 
P(0,1) have no substitution ø such that P(3,5) = P(0,1)o. For the abstracted 
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versions on the other hand, x = 3,y = 5 || P(x,y) and u = 0,v = 1 || P(u,v) we 
can find a matching substitution for the first-order part o := {u > a,u > y} 
and would have to check the constraints semantically to exclude the matching. 


Hierarchic Resolution. One inference rule, foundational to most algorithms for 
solving constrained first-order clauses, is hierarchic resolution [6]: 


Ay | Dy V Ci Ag | Lo V Co o = mgu( Lı, comp(L2)) 
(41, Ae | Ci V C2)o 


The conclusion is called hierarchic resolvent (of the two clauses in the premise). 
A refutation is the sequence of resolution steps that produces a clause A || 
L with AURA H Ad for some grounding ô. Hierarchic resolution is sound and 
refutationally complete for the BS(LRA) clauses considered here, since every 
set N of BS(LRA) clauses is sufficiently complete [6], because all constatnts 
of the arithemtic sort are numbers. Hence hierarchic resolution is sound and 
refutationally complete for N [6,7]. Hierarchic unit resolution is a special case 
of hierarchic resolution, that only combines two clauses in case one of them is a 
unit clause. Hierarchic unit resolution is sound and complete for HBS(LRA) [6,7], 
but not even refutationally complete for BS(LRA). 

Most algorithms for Bernays-Schnonfinkel, first-order logic, and beyond uti- 
lize resolution. The SCL(T) calculus for HBS(LRA) uses hierarchic resolution 
in order to learn from the conflicts it encounters during its search. The hierar- 
chic superposition calculus on the other hand derives new clauses via hierarchic 
resolution based on an ordering. The goal is to either derive the empty clause 
or a saturation of the clause set, i.e., a state from which no new clauses can be 
derived. Each of those algorithms must derive new clauses in order to progress, 
but their subroutines also get progressively slower as more clauses are derived. In 
order to increase efficiency, it is necessary to eliminate clauses that are obsolete. 
One measure that determines whether a clause is useful or not is redundancy. 


Redundancy. In order to define redundancy for constrained clauses, we need 
an H-order, i.e., a well-founded, total, strict ordering < on ground literals such 
that literals in the constraints (in our case arithmetic literals) are always smaller 
than first-order literals. Such an ordering can be lifted to constrained clauses and 
sets thereof by its respective multiset extension. Hence, we overload any such 
order < for literals, constrained clauses, and sets of constrained clause if the 
meaning is clear from the context. We define < as the reflexive closure of < and 
N>4II° :— {D | D € N and D < A || C}. An instance of an LPO [15] with 
appropriate precedence can serve as an H-order. 


Definition 2 (Clause Redundancy). A ground clause A||C is redundant 
with respect to a set N of ground clauses and an H-order < if N341C E A||C. 
A clause A||C is redundant with respect to a clause set N and an H-order < 
if for all A’||C’ € gnd(A||C) the clause A’ || C’ is redundant with respect to 
gnd(N). 
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If a clause A || C is redundant with respect to a clause set N, then it can be 
removed from N without changing its semantics. Determining clause redundancy 
is an undecidable problem [11,63]. However, there are special cases of redundant 
clauses that can be easily checked, e.g., tautologies and subsumed clauses. Tech- 
niques for tautology deletion and subsumption deletion are the most common 
elimination techniques in modern first-order provers. 

A tautology is a clause that evaluates to true independent of the predicate 
interpretation or assignment. It is therefore redundant with respect to all orders 
and clause sets; even the empty set. 


Corollary 3 (Tautology for Constrained Clauses). A clause A||C is a 
tautology if the existential closure of =(A|| C) is unsatisfiable. 


Since —=(A]|C) is essentially ground (by existential closure and skolemiza- 
tion), it can be solved with an appropriate SMT solver, i.e., an SMT solver that 
supports unquantified uninterpreted functions coupled with linear real arith- 
metic. In [2], it is recommended to check only the following conditions for tau- 
tology deletion in hierarchic superposition: 


Corollary 4 (Tautology Check). A clause A||C is a tautology if the exis- 
tential closure of A is unsatisfiable or if C contains two literals Ly and Lz with 
Lı = comp( L2). 


The advantage is that the check on the first-order side of the clause is still 
purely syntactic and corresponds to the tautology check for pure first-order logic. 
Nonetheless, there are tautologies that are not captured by Corollary 4, e.g., 
x = y|| P(x) V ~P(y). The SCL(T) calculus on the other hand requires no 
tautology checks because it never learns tautologies as part of its conflict analysis 
[1,11,21]. This property is also inherent to the propositional CDCL (Conflict 
Driven Clause Learning) approach [8,34,41,55,63]. 


3 Subsumption for Constrained Clauses 


A subsumed constrained clause is a clause that is redundant with respect to a 
single clause in our clause set. Formally, subsumption is defined as follows. 


Definition 5. (Subsumption for Constrained Clauses [2]). A constrained 
clause A, || Cı subsumes another constrained clause Ag || C2 if there exists a sub- 
stitution o such that Cio C Co, vars( A10) C vars(A2), and the universal closure 
of Az > (Ajo) holds in LRA. 


Eliminating redundant clauses is crucial for the efficient operation of an auto- 
matic first-order theorem prover. Although subsumption is considered one of the 
easier redundancy relationships that we can check in practice, it is still a hard 
problem in general: 


Lemma 6. (Complexity of Subsumption in the BS Fragment). Deciding 
subsumption for a pair of BS clauses is NP-complete. 
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Proof. Containment in NP follows from the fact that the size of subsumption 
matchers is limited by the subsumed clause and set inclusion of literals can 
be decided in polynomial time. For the hardness part, consider the following 
polynomial-time reduction from 3-SAT. Take a propositional clause set where 
all clauses have length three. Now introduce a 6-place predicate R and encode 
each propositional variable P by a first-order variable xp. Then a propositional 
clause Lı V L2 V Lg can be encoded by an atom R(T p, , p1, £P, P2, £P, p3) where 
pi is 0 if L; is negative and 1 otherwise and P; is the predicate of L;. This way 
the clause set N can be represented by a single BS clause Cy. Now construct a 
clause D that contains all atoms representing the way a clause of length three 
can become true by ground atoms over R and constants 0, 1. For example, it 
contains atoms like R(0,0,...) and R(1,1,...) representing that the first literal 
of a clause is true. Actually, for each such atom R(0,0,...) the clause D contains 
|Cy| copies. Finally, Cy subsumes D if and only if N is satisfiable. 


In order to be efficient, modern theorem provers need to decide multiple 
thousand subsumption checks per second. In the pure first-order case, this is 
possible because of indexing and filtering techniques that quickly decide most 
subsumption checks [24, 25, 27-30,33,39, 40, 45-49, 52-54, 56, 59, 61,62]. 

For BS(LRA) (and FOL(LRA)), there also exists research on how to perform 
the subsumption check in general [2,36], but the literature contains no dedicated 
indexing or filtering techniques for the constraint part of the subsumption check. 
In this section and as the main contribution of this paper, we present the first 
such filtering techniques for BS(LRA). But first, we explain how to solve the 
subsumption check for constrained clauses in general. 


First-Order Check. The first step of the subsumption check is exactly the 
same as in first-order logic without arithmetic. We have to find a substitution 
a, also called a matcher, such that Cio C Cy. The only difference is that it is 
not enough to compute one matcher g, but we have to compute all matchers 
for Cia C Co until we find one that satisfies the implication Ag — (A10). For 
instance, there are two matchers for the clauses C1 := x + y > 0|| Q(a,y) and 
C2 := x < 0,y > 0|| Q(z, 2) V Q(y,y). The matcher {x +> y} satisfies the 
implication Ay — (40) and {y + x} does not. Our own algorithm for finding 
matchers is in the style of Stillman except that we continue after we find the 
first matcher [27,58]. 


Implication Check. The universal closure of the implication Az —> (A,c) can 
be solved by any SMT solver for the respective theory after we negate it. Note 
that the resulting formula 


Fary,...,%n. Ag A7(Aio) where {21,...,%n} = vars(A2) (1) 


is already in clause normal form and that the formula can be treated as ground 
since existential variables can be handled as constants. Intuitively, the universal 
closure Az — (Ajo) asserts that the set of solutions satisfying Ag is a subset of 
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Fig. 1. Solutions of the constraints A410, A2, and A3 depicted as polytopes 


the set of solutions satisfying A,o. This means a solution to its negation (1) is a 
solution for A2, but not for A,c, thus a counterexample of the subset relation. 


Example 7. Let us now look at an example to illustrate the role that formula (1) 
plays in deciding subsumption. In our example, we have three clauses: A, || C1, 
Ag || C2, and A3 || C2, where C1 := 4P(a,y) V Q(u, z), C2 = ~P(x,y) V Q(2, x), 
Ay=y>0,ysu,y<artz, y>ute—-2-u,Ags=ar>1,y<l1l,y>a-l, 
and Az = 2>2,y<1,y > 2-2. Our goal is to test whether A, || Cy 
subsumes the other two clauses. As our first step, we try to find a substitution 
a such that Cio C C2. The most general substitution fulfilling this condition is 
o := {z+ x,u +> 2}. Next, we check whether Ajo is implied by Ag and As. 
Normally, we would do so by solving the formula (1) with an SMT solver, but to 
help our intuitive understanding, we instead look at their solution sets depicted 
in Fig. 1. Note that A,o simplifies to Ajo =y > 0, y <2, y <2-x, y>2-a-4. 
Here we see that the solution set for Az is a subset of A;a. Hence, A2 implies 
Ac, which means that Ag || C2 is subsumed by A, || C1. The solution set for As 
is not a subset of Ajo. For instance, the assignment 32 := {x +> 3,y + 1} is 
a counterexample and therefore a solution to the respective instance of formula 
(1). Hence, A; || C1 does not subsume ^s || C2. 


Excess Variables. Note that in general it is not sufficient to find a sub- 
stitution ø that matches the first-order parts to also match the theory con- 
straints: Cia C Cə does not generally imply vars(Aıo) C vars(A2). In par- 
ticular, if A, contains variables that do not appear in the first-order part 
Cı, then these must be projected to Ag. We arrive at a variant of (1), that 
is 4ay,...,¢nVY1,---;Ym- A2 A a(Ayo) where {21,...,2%,} = vars(Az) and 
{y1,---;Ym} = vars(A;) \ vars(C1). Our solution to this problem is to normal- 
ize all clauses A || C by eliminating all excess variables VY := vars(A) \ vars(C) 
such that vars(A) C vars(C) is guaranteed. For linear real arithmetic this is 
possible with quantifier elimintation techniques, e.g., Fourier-Motzkin elimina- 
tion (FME). Although these techniques typically cause the size of A to increase 
exponentially, they often behave well in practice. In fact, we get rid of almost 
all excess variables in our benchmark examples with simplification techniques 
based on Gaussian elimination with execution time linear in the number of LRA 
atoms. Given the precondition Y = @ achieved by such elimination techniques, 
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we can compute o as matcher for the first-order parts and then directly use it 
for testing whether the universal closure of A2 — (Aja) holds. An alternative 
solution to the issue of excess variables has been proposed: In [2], the substitu- 
tion ø is decomposed as ø = ôr, where ô is the first-order matcher and 7 is a 
theory matcher, i.e. dom(r) C Y and vars(codom(r)) C vars(Az). Then, exploit- 
ing Farkas’ lemma, the computation of 7 is reduced to testing the feasibility of 
a linear program (restricted to matchers that are affine transformations). 

The reduction to solving a linear program offers polynomial worst-case com- 
plexity but in practice typically behaves worse than solving the variant with 
quantifier alternations using an SMT solver such as Z3 [36, 42]. 


Filtering First-Order Literals. Even though deciding implication of theory 
constraints is in practice more expensive than constructing a matcher and decid- 
ing inclusion of first-order literals, we still incorporate some lightweight filters 
for our evaluation. Inspired by Schulz [54] we choose three features, so that every 
feature f maps clauses to No, and f(C1) < f(C2) is necessary for Cia C Co. 

The features are: |Ct|, the number of positive first-order literals in C, |C7 |, 
the number of negative first-order literals in C, and |C], the number of occur- 
rences of constants in C. 


Sample Point Heuristic. The majority of subsumption tests fail because we 
cannot find a fitting substitution for their first-order parts. In our experiments, 
between 66.5% and 99.9% of subsumption tests failed this way. This means our 
tool only has to check in less than 33.5% of the cases whether one theory con- 
straint implies the other. Despite this, our tool spends more time on implication 
checks than on the first-order part of the subsumption tests without filtering on 
the constraint implication tests. The reason is that constraint implication tests 
are typically much more expensive than the first-order part of a subsumption 
test. For this reason, we developed the sample point heuristic that is much faster 
to execute than a full constraint implication test, but still filters out the majority 
of implications that do not hold (in our experiments between 93.8% and 100%). 

The idea behind the sample point heuristic is straightforward. We store for 
each clause A||C a sample solution for its theory constraint A. Before we 
execute a full constraint implication test, we simply evaluate whether the sample 
solution 8 for Ag is also a solution for Ac. If this is not the case, then £ is a 
solution for (1) and a counterexample for the implication. If @ is a solution for 
Ajo, then the heuristic returns unknown and we have to execute a full constraint 
implication test, i.e., solve the SMT problem (1). 

Often it is possible to get our sample solutions for free. Theorem provers 
based on hierarchic superposition typically check for every new clause A||C 
whether A is satisfiable in order to eliminate tautologies. This means we can 
already use this tautology check to compute and store a sample solution for 
every new clause without extra cost. We only need to pick a solver for the check 
that returns a solution as a certificate of satisfiability. Although the SCL(T) 
calculus never learns any tautologies, it is also possible to get a sample solution 
for free as part of its conflict analysis [11]. 
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Example 8. We revisit Example 7 to illustrate the sample point heuristic. During 
the tautology check for Ag || C2 and 4s || C2, we determined that 61 := {x > 
2,y ++ 1} is a sample solution for A> and bz := {x > 3,y + 1} a sample 
solution for A3. Since Az implies Ajo, all sample solutions for Ag automatically 
satisfy A,o. This is the reason why the sample point heuristic never filters out an 
implication that actually holds, i.e., it returns unknown when we test whether A2 
implies Ayo. The assignment (2 on the other hand does not satisfy A,o. Hence, 
the sample point heuristic correctly claims that A3 does not imply A,o. Note 
that we could also have chosen (3; as the sample point for A3. In this case, the 
sample point heuristic would also return unknown for the implication A3 — A10 
although the implication does not hold. 


Trivial Cases. Subsumption tests become much easier if the constraint A; of 
one of the participating clauses is empty. We use two heuristic filters to exploit 
this fact. We highlight them here because they already exclude some subsump- 
tion tests before we reach the sample point heuristic in our implementation. 

The empty conclusion heuristic exploits that A, is valid if A; is empty. In this 
case, all implications A2 — (A10) hold because Ajo evaluates to true under any 
assignment. So by checking whether A, = 9, we can quickly determine whether 
A2 — (Aja) holds for some pairs of clauses. Note that in contrast to the sample 
point heuristic, this heuristic is used to find valid implications. 

The empty premise test exploits that A> is valid if Az is empty. In this case, 
an implication A2 — (Ac) may only hold if A,o simplifies to the empty set as 
well. This is the case because any inequality in the canonical form )7"_, a;xi<c 
either simplifies to true (because a; = 0 for all i = 1,...,n and O<c holds) and 
can be removed from Ajo, or the inequality eliminates at least one assignment 
as a solution for Ayo [51]. So if A2 = Ø, we check whether 410 simplifies to the 
empty set instead of solving the SMT problem (1). 


Pipeline. We call our approach a pipeline since it combines multiple procedures, 
which we call stages, that vary in complexity and are independent in principle, 
for the overall aim of efficiently testing subsumption. Pairs of clauses that “make 
it through” all stages, are those for which the subsumption relation holds. The 
pipeline is designed with two goals in mind: (1) To reject as many pairs of 
clauses as early as possible, and (2) to move stages further towards the end of 
the pipeline the more expensive they are. 

The pipeline consists of six stages, all of which are mentioned above. We 
divide the pipeline into two phases, the first-order phase (FO-phase) consisting 
of two stages, and the constraint phase (C-phase), consisting of four stages. 
First-order filtering rejects all pairs of clauses for which f(C1) > f(C2) holds. 
Then, matching constructs all matchers o such that Cia C Ca. Every matcher 
is individually tested in the constraint phase. Technically, this means that the 
input of all following stages is not just a pair of clauses, but a triple of two clauses 
and a matcher. The constraint phase then proceeds with the empty conclusion 
heuristic and the empty premise test to accept (resp. reject) all trivial cases of 
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Algorithm 1: Saturation prover used for evaluation 


Input : A set N of clauses. 

Output : | or “unknown”. 

U:={CeEN||C|=1} 

while U 49 do 

M :=0 

foreach C € U do M = M U resolvents(C, N) 

if L € M then return L 

reduce M using N (forward subsumption) 
if M = then return “unknown” 

reduce N using M (backward subsumption) 
U :=4{C E€ M||C|=1} 

N:=NUM 

end 

return “unknown” 
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the constraint implication test. The next stage is the sample point heuristic. 
If the sample solution 6z for Az is no solution for A; (i.e. F A10 b2), then the 
matcher ø is rejected. Otherwise (i.e. F A1062), the implication test Az > (410) 
is performed by solving the SMT problem (1) to produce the overall result of 
the pipeline and finally determine whether subsumption holds. 


4 Experimentation 


In order to evaluate our new approach on three benchmark instances, derived 
from BS(LRA) applications, all presented techniques and their combination in 
form of a pipeline were implemented in the theorem prover SPASS-SPL, a pro- 
totype for BS(LRA) reasoning. 

Note that SPASS-SPL contains more than one approach for BS(LRA) rea- 
soning, e.g., the Datalog hammer for HBS(LRA) reasoning [10]. These vari- 
ous modes of operation operate independently, and the desired mode is cho- 
sen via command-line option. The reasoning approach discussed here is the 
current default option. On the first-order side, SPASS-SPL consists of a sim- 
ple saturation prover based on hierarchic unit resolution, see Algorithm 1. It 
resolves unit clauses with other clauses until either the empty clause is derived 
or no new clauses can be derived. Note that this procedure is only complete 
for Horn clauses. For arithmetic reasoning, SPASS-SPL relies on SPASS-SATT, 
our sound and complete CDCL(LA) solver for quantifier-free linear real and 
linear mixed/integer arithmetic [12]. SPASS-SATT implements a version of the 
dual simplex algorithm fine-tuned towards SMT solving [16]. In order to ensure 
soundness, SPASS-SATT represents all numbers with the help of the arbitrary- 
precision arithmetic library FLINT [31]. This means all calculations, including 
the implication test and the sample point heuristic, are always exact and thus 
free of numerical errors. The most relevant part of SPASS-SPL with regards to 
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Table 1. Overview of how many clause pairs advance in the pipeline (top to bottom) 


lc bakery, tad All 
All 1244 819k 196 437k 1441 256k 
Filtering 61.21% 85.03% 64.45% 
o f(C1) < f(C2) 761905k 61.2061% | 167025k 85.0274%| 928931k 64.4540% 
| Matching 0.02% 39.83% 7.18% 
Cla C C2 131k 0.0106% | 66531k 33.8694% 66664k 4.6254% 
Empty (pre./con.) 44.73% 100.00% 99.89% 
F Ajo, É A2 59k 0.0047% | 66531k 33.8694% 66591k 4.6203% 
O Sample point 59.28% 0.12% 0.18% 
E Ajo Be 35k 0.0028% 82k 0.0416% 117k 0.0081% 
Implication 95.51% 100.00% 98.66% 
Subsumes 33k 0.0027% 82k 0.0416% 115k 0.0080% 
Table 2. An overview of the accuracy of non-perfect pipeline stages 
Test Specificity /Sensitivity Pos. /Neg. Predictive Value 
Instances lc bakery,tad| All lc bakery,tad| All 
FO Filtering 0.38797) 0.14979 | 0.35552 | 0.00013 | 0.00049 | 0.00020 
FO Matching 0.99996) 0.60196 | 0.92841 | 0.78456 | 0.00123 | 0.00275 
Empty Conclusion | 0.70973) 0.00000 0.00103 | 0.54474, 0.00123 | 0.00173 
Sample Point 0.93864) 1.00000 0.99998 | 0.95510, 1.00000 | 0.98653 


this paper is that it performs tautology and subsumption deletion to eliminate 
redundant clauses. As a preprocessing step, SPASS-SPL eliminates all tautolo- 
gies from the set of input clauses. Similarly, the function resolvents(C, N) (see 
Line 4 of Algorithm 1) filters out all newly derived clauses that are tautologies. 
Note that we also use these tautology checks to eliminate all excess variables 
and to store sample solutions for all remaining clauses. After each iteration of 
the algorithm, we also check for subsumed clauses. We first eliminate newly gen- 
erated clauses by forward subsumption (see Line 6 of Algorithm 1), then use the 
remaining clauses for backward subsumption (see Line 8 of Algorithm 1). 


Benchmarks. Our benchmarking instances come out of three different appli- 
cations. (1.) A supervisor for an automobile lane change assistant, formulated 
in the Horn fragment of BS(LRA) [9,10] (five instances, referred to as 1c in 
aggregate). (2.) The formalization of reachability for non-deterministic timed 
automata, formulated in the non-Horn fragment of BS(LRA) [20] (one instance, 
referred to as tad). (3.) Formalizations of variants of mutual exclusion proto- 
cols, such as the bakery protocol [38], also formulated in the non-Horn fragment 
of BS(LRA) [19] (one instance, referred to as bakery). The machine used for 
benchmarking features an Intel Xeon W-1290P CPU (10 cores, 20 threads, up 
to 5.2GHz) and 64GiB DDR4-2933 ECC main memory. Runtime was limited 
to ten minutes, and memory usage was not limited. 
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Table 3. Evaluation of the sample point heuristic 


Instances Ic bakery,tad Al 
Bottleneck (C time + FO time) 
without sample point 127 2757 14867 
with sample point 78 32 89 
Avg. pipeline runtime in ys | 
without sample point | 0.0315 89.9401 0.5189 
with sample point | 0.0311 1.4150 0.2197 
| Speedup _ (C time with + without) 1.63 137.88 124.16 | 
Benefit-to-cost (C time taken + saved)| 6.74 181.72 163.72 


Evaluation. In Table 1 we give an overview of how many pairs of clauses advance 
how far in the pipeline (in thousands). Rows with grey background refer to a 
stage of the pipeline and show which portion of pairs of clauses were kept, relative 
to the previous stage. Rows with white background refer to (virtual) sets of 
clauses, their absolute size, and their size relative to the number of attempted 
tests, as well as the condition(s) established. The three groups of columns refer 
to groups of benchmark instances. Results vary greatly between 1c and the 
aggregate of bakery and tad. In 1c the relative number of subsumed clauses is 
significantly smaller (0.0027% compared to 0.0416%). FO Matching eliminates a 
large number of pairs in 1c, because the number of predicate symbols, and their 
arity (1c1, ..., 1c4: 36 predicates, arities up to 5; 1c5: 53 predicates, arities 
up to 12) is greater than in bakery (11 predicates, all of arity 2) and tad (4 
predicates, all of arity 2). 


Binary Classifiers. To evaluate the performance of each stage of the proposed 
test pipeline, we view each stage individually as a binary classifier on pairs 
of constrained clauses. The two classes we consider are “subsumes” (positive 
outcome) and “does not subsume” (negative outcome). Each stage of the pipeline 
computes a prediction on the actual result of the overall pipeline. We are thus 
interested in minimizing two kinds of errors: (1) When one stage of the pipeline 
predicts that the subsumption test will succeed (the prediciton is positive) but 
it fails (the actual result is negative), called false positive (FP). (2) When one 
stage of the pipeline predicts that the subsumption test will fail (the prediction 
is negative) but it succeeds (the actual result is positive), called false negative 
(FN). Dually, a correct prediction is called true positive (TP) and true negative 
(TN). For each stage, at least one kind of error is excluded by design: First- 
order filtering and the sample point heuristic never produce false negatives. The 
empty conclusion heuristic never produces false positives. The empty premise 
test is perfect, i.e. it neither produces false positives nor false negatives, with the 
caveat of not always being applicable. The last stage (implication test) decides 
the overall result of the pipeline, and thus is also perfect. For evaluation of binary 
classifiers, we use four different measures (two symmetric pairs): 


SPC = TN + (TN + FP) PPV = TP + (TP + FP) (2) 
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The first pair, specificity (SPC) and positive predictive value, see (2), is relevant 
only in presence of false postives (the measures approach 1 as FP approaches 0). 


SEN = TP + (TP + FN) NPV = TN + (TN + FN) (3) 


The second pair, sensitivity (SEN) and negative predictive value (NPV), see (3), 
is relevant only in presence of false negatives (the measures approach 1 as FN 
approaches 0). Specificity (resp. sensitivity) might be considered the “success 
rate” in our setup. They answer the question: “Given the actual result of the 
pipeline is ‘subsumed’ (resp. ‘not subsumed’), in how many cases does this stage 
predict correctly?” A specificity (resp. sensitivity) of 0.99 means that the clas- 
sifier produces a false positive (resp. negative), i.e. a wrong prediction, in one 
out of one hundred cases. Both measures are independent of the prevalence of 
particular actual results, i.e. the measures are not biased by instances that fea- 
ture many (or few) subsumed clauses. On the other hand, positive and negative 
predictive value are biased by prevalence. They answer the following question: 
“Given this stage of the pipeline predicts ‘subsumed’ (resp. ‘not subsumed’), how 
likely is it that the actual result indeed is ‘subsumed’ (resp. ‘not subsumed’)?” 

In Table2 we present for all non-perfect stages of the pipeline specificity 
(for those that produce false positives) and sensitivity (for those that produce 
false negatives) as well as the (positive/negative) predictive value. Note that the 
sample point heuristic has an exceptionally high specificity, still above 93% in 
the benchmarks where it performed worst. For the benchmarks bakery and tad 
it even performs perfectly. Combined, this gives a specificity of above 99.99%. 
Considering FO Filtering, we expect limited performance, since the structure 
of terms in BS is flat compared to the rich structure of terms as trees in full 
first-order logic. This is evidenced by a comparatively low specificity of 35%. 
However, this classifier is very easy to compute, so pays for itself. FO Matching 
is a much better classifier, at an aggregate sensitivity of 93%. Even though this 
classifier is NP-complete, this is not problematic in practice. 


Runtime. In Table 3 we focus on the runtime improvement achieved by the sample 
point heuristic. In the first two lines (Bottleneck), we highlight how much slower 
testing implication of constraints (the C-phase) is compared to treating the first- 
order part (the FO-phase). This is equivalent to the time taken for the C-phase 
per pair of clauses (that reach at least the first C-phase) divided by the time taken 
for the FO-phase per pair of clauses. We see that without the sample point heuris- 
tic, we can expect the constraint implication test to take hundreds to thousands 
of times longer than the FO-phase. Adding the sample point heuristic decreases 
this ratio to below one hundred. In the fourth line (avg. pipeline runtime) we do 
not give aratio, but the average time it takes to compute the whole pipeline. We 
achieve millions of subsumption checks per second. In the fifth line (Speedup), we 
take the time that all C-phases combined take per pair of clauses that reach at 
least the first C-phase, and take the ratio to the same time without applying the 
sample point heuristic. In the sixth line (Benefit-to-cost), we consider the time 
taken to compute the sample point vs. the time it saves. The benefit is about two 
orders of magnitude greater than the cost. 
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5 Conclusion 


Our next step will be the integration of the subsumption test in the backward 
subsumption procedure of an SCL based reasoning procedure for BS(LRA) [11] 
which is currently under development. 

There are various ways to improve the sample point heuristic. One improve- 
ment would be to store and check multiple sample points per clause. For instance, 
whenever the sample point heuristic fails and the implication test for Az > (410) 
also fails, store the solution to (1) as an additional sample point for A2. The new 
sample point will filter out any future implication tests with A ,o or similar 
constraints. However, testing too many sample points might lead to costs out- 
weighing benefits. A potential solution to this problem would be score-based 
garbage collection, as done in SAT solvers [57]. Another way to store and check 
multiple sample points per clause is to store a compact description of a set of 
points that is easy to check against. For instance, we can store the center point 
and edge length of the largest orthogonal hypercube contained in the solutions 
of a constraint, which is equivalent to infinitely many sample points. Computing 
the largest orthogonal hypercube for an LRA constraint is not much harder than 
finding a sample solution [14]. Checking whether a cube is contained in an LRA 
constraint works almost the same as evaluating a sample point [14]. 

Although we developed our sample point technique for the BS(LRA) frag- 
ment it is obvious that it will also work for the overall FOL(LRA) clause frag- 
ment, because this extension does not affect the LRA constraint part of clauses. 
From an automated reasoning perspective, satisfiability of the FOL(LRA) and 
BS(LRA) fragments (clause sets) is undecidable in both cases. Actually, satisfi- 
ability of a BS(LRA) clause set is already undecidable if the first-order part is 
restricted to a single monadic predicate [32]. The first-order part of BS(LRA) is 
decidable and therefore enables effective guidance for an overall reasoning pro- 
cedure [11]. Form an application perspective, the BS(LRA) fragment already 
encompasses a number of used (sub)languages. For example, timed automata [3] 
and a number of extensions thereof are contained in the BS(LRA) fragment [60]. 

We also believe that the sample point heuristic will speed up the constraint 
implication test for FOL(LIA), first-order clauses over linear integer arithmetic, 
FOL(NRA), i.e., first-order clauses over non-linear real arithmetic, and other 
combinations of FOL with arithmetic theories. However, the non-linear case will 
require a more sophisticated setup due to the nature of test points in this case, 
e.g., a solution may contain root expressions. 
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Abstract. Problems in many theories axiomatised by unit equalities 
(UEQ), such as groups, loops, lattices, and other algebraic structures, 
are notoriously difficult for automated theorem provers to solve. Con- 
sequently, there has been considerable effort over decades in developing 
techniques to handle these theories, notably in the context of Knuth- 
Bendix completion and derivatives. The superposition calculus is a gen- 
eralisation of completion to full first-order logic; however it does not carry 
over all the refinements that were developed for it, and is therefore not 
a strict generalisation. This means that (i) as of today, even state of the 
art provers for first-order logic based on the superposition calculus, while 
more general, are outperformed in UEQ by provers based on completion, 
and (ii) the sophisticated techniques developed for completion are not 
available in any problem which is not in UEQ. In particular, this includes 
key simplifications such as ground joinability, which have been known for 
more than 30 years. In fact, all previous completeness proofs for ground 
joinability rely on proof orderings and proof reductions, which are not 
easily extensible to general clauses together with redundancy elimina- 
tion. In this paper we address this limitation and extend superposition 
with ground joinability, and show that under an adapted notion of redun- 
dancy, simplifications based on ground joinability preserve completeness. 
Another recently explored simplification in completion is connectedness. 
We extend this notion to “ground connectedness” and show superposi- 
tion is complete with both connectedness and ground connectedness. We 
implemented ground joinability and connectedness in a theorem prover, 
iProver, the former using a novel algorithm which we also present in this 
paper, and evaluated over the TPTP library with encouraging results. 


Keywords: Superposition - Ground joinability - Connectedness - 
Closure redundancy - First-order theorem proving 


1 Introduction 


Automated theorem provers based on equational completion [4], such as Wald- 
meister, MaedMax or Twee [13,21,25], routinely outperform superposition-based 
provers on unit equality problems (UEQ) in competitions such as CASC [22], 
despite the fact that the superposition calculus was developed as a generalisation 
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of completion to full clausal first-order logic with equality [19]. One of the main 
ingredients for their good performance is the use of ground joinability criteria for 
the deletion of redundant equations [1], among other techniques. However, exist- 
ing proofs of refutational completeness of deduction calculi wrt. these criteria 
are restricted to unit equalities and rely on proof orderings and proof reduc- 
tions [1,2,4], which are not easily extensible to general clauses together with 
redundancy elimination. 

Since completion provers perform very poorly (or not at all) on non-UEQ 
problems (relying at best on incomplete transformations to unit equality [8]), this 
motivates an attempt to transfer those techniques to the superposition calculus 
and prove their completeness, so as to combine the generality of the superposition 
calculus with the powerful simplification rules of completion. To our knowledge, 
no prover for first-order logic incorporates ground joinability redundancy criteria, 
except for particular theories such as associativity-commutativity (AC) [20]. 

For instance, if f(x,y) œ~ f(y,v) is an axiom, then the equation 
f(x, fly, z)) ~ f(x, f(z,y)) is redundant, but this cannot be justified by any 
simplificaton rule in the superposition calculus. On the other hand, a comple- 
tion prover which implements ground joinability can easily delete the latter 
equation wrt. the former. We show that ground joinability can be enabled in the 
superposition calculus without compromising completeness. 

As another example, the simplification rule in completion can use f(x) ~ s 
(when f(x) > s) to rewrite f(a) ~ t regardless of how s and t compare, while 
the corresponding demodulation rule in superposition can only rewrite if s < t. 
Our “encompassment demodulation” rule matches the former, while also being 
complete in the superposition calculus. 

In [11] we introduced a novel theoretical framework for proving complete- 
ness of the superposition calculus, based on an extension of Bachmair-Ganzinger 
model construction [5], together with a new notion of redundancy called “closure 
redundancy”. We used it to prove that certain AC joinability criteria, long used 
in the context of completion [1], could also be incorporated in the superposition 
calculus for full first-order logic while preserving completeness. 

In this paper, we extend this framework to show the completeness of the 
superposition calculus extended with: (i) a general ground joinability simplifi- 
cation rule, (ii) an improved encompassment demodulation simplification rule, 
(iii) a connectedness simplification rule extending [3,21], and (iv) a new ground 
connectedness simplification rule. The proof of completeness that enables these 
extensions is based on a new encompassment closure ordering. In practice, these 
extensions help superposition to be competitive with completion in UEQ prob- 
lems, and improves the performance on non-UEQ problems, which currently do 
not benefit from these techniques at all. 

We also present a novel incremental algorithm to check ground joinability, 
which is very efficient in practice; this is important since ground joinability can 
be an expensive criterion to test. Finally, we discuss some of the experimental 
results we obtained after implementing these techniques in iProver [10, 16]. 

The paper is structured as follows. In Sect. 2 we define some basic notions to 
be used throughout the paper. In Sect. 3 we define the closure ordering we use to 
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prove redundancies. In Sect. 4 we present redundancy criteria for demodulation, 
ground joinability, connectedness, and ground connectedness. We prove their 
completeness in the superposition calculus, and discuss a concrete algorithm for 
checking ground joinability, and how it may improve on the algorithms used in 
e.g. Waldmeister [13] or Twee [21]. In Sect. 5 we discuss experimental results. 


2 Preliminaries 


We consider a signature consisting of a finite set of function symbols and the 
equality predicate as the only predicate symbol. We fix a countably infinite set 
of variables. First-order terms are defined in the usual manner. Terms without 
variables are called ground terms. A literal is an unordered pair of terms with 
either positive or negative polarity, written s ~ t and s æ% t respectively (we 
write s Št to mean either of the former two). A clause is a multiset of literals. 
Collectively terms, literals, and clauses will be called expressions. 

A substitution is a mapping from variables to terms which is the identity 
for all but finitely many variables. An injective substitution onto variables is 
called a renaming. If e is an expression, we denote application of a substitution 
a by eo, replacing all variables with their image in ø. Let GSubs(e) = {o | 
eo is ground} be the set of ground substitutions for e. Overloading this notation 
for sets we write GSubs(E) = {a | Ve € E. eo is ground}. Finally, we write e.g. 
GSubs(e1, e2) instead of GSubs({e1, e2}). The identity substitution is denoted 
by €. 

A substitution 6 is more general than o if 0p = o for some substitution p 
which is not a renaming. If s and t can be unified, that is, if there exists o such 
that so = to, then there also exists the most general unifier, written mgu(s, t). 
A term s is said to be more general than t if there exists a substitution 6 that 
makes s0 = t but there is no substitution ø such that to = s. Two terms s and t 
are said to be equal modulo renaming if there exist injective 0, ø such that s0 = t 
and to = s. The relations “less general than”, “equal modulo renaming”, and 
their union are represented respectively by the symbols 3, =, and J 

A more refined notion of instance is that of closure [6]. Closures are pairs 
e-o that are said to represent the expression eo while retaining information 
about the original term and its instantiation. Closures where eo is ground are 
said to be ground closures. Let GClos(e) = {e - o | eo is ground} be the set of 
ground closures of e. Overloading the notation for sets, if N is a set of clauses 
then GClos(N) = Ugen GClos(C). 

We write s[t] if t is a subterm of s. If also s Æ t, then it is a strict subterm. 
We denote these relations by s > t and s >t respectively. We write s[t > t’] to 
denote the term obtained from s by replacing all occurrences of t by t. 

A (strict) partial order is a binary relation which is transitive (a > b >= c€ > 
a > c), irreflexive (a % a), and asymmetric (a > b => b ¥ a). A (non-strict) 
partial preorder (or quasiorder) is any transitive, reflexive relation. A (pre)order 
is total over X if Vz,y € X. x = y Vy = x. Whenever a non-strict (pre)order 
> is given, the induced equivalence relation ~ is = N >, and the induced strict 
pre(order) > is =\~. The transitive closure of a relation >, the smallest transitive 
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relation that contains >, is denoted by >*. A transitive reduction of a relation 
>, the smallest relation whose transitive closure is >, is denoted by >~. 

For an ordering > over a set X, its multiset extension > over multisets of X 
is given by: A> B iff A # B and Vz € B. B(x) > A(x) dy € A. y > tAA(y) > 
B(y), where A(x) is the number of occurrences of element x in multiset A (we 
also use >> for the the multiset extension of >). It is well known that the mutl- 
tiset extension of a well-founded/total order is also a well-founded/total order, 
respectively [9]. The (n-fold) lexicographic extension of > over X is denoted 
>Iex over ordered n-tuples of X, and is given by (£1,..., 2n) tex (Y1,--+;Yn) 
iff Ji. xı = y1 A+++ A ti—1 = Yi-1 A^ zi > yi. The lexicographic extension of a 
well-founded/total order is also a well-founded/total order, respectively. 

A binary relation — over the set of terms is a rewrite relation if (i) L > 
r = lo > ro and (ii) lo r s[l] > s|l > r]. The reflexive-transitive closure 
of a relation is the smallest reflexive-transitive relation which contains it. It is 
denoted by ->. Two terms are joinable (s | t) ifs > ut. 

If a rewrite relation is also a strict ordering, then it is a rewrite ordering. A 
reduction ordering is a rewrite ordering which is well-founded. In this paper we 
consider reduction orderings which are total on ground terms, such orderings are 
also simplification orderings i.e., satisfy s > t > s >t. 


3 Ordering 


In [11] we presented a novel proof of completeness of the superposition calculus 
based on the notion of closure redundancy, which enables the completeness of 
stronger redundancy criteria to be shown, including AC normalisation, AC join- 
ability, and encompassment demodulation. In this paper we use a slightly different 
closure ordering (>ec), in order to extract better completeness conditions for the 
redundancy criteria that we present in this paper (the definition of closure redun- 
dant clause and closure redundant inference is parametrised by this ><<). 

Let >; be a simplification ordering which is total on ground terms. We extend 
this first to an ordering on ground term closures, then to an ordering on ground 
clause closures. Let 


either so >, tp 
or else so = tp and s I t, 


(1) 


S-o >t tp iff 


where so and tp are ground, and let >;. be an (arbitrary) total well-founded 
extension of >;... We extend this to an ordering on clause closures. First let 


Mic((s ~ t) - 0) = {50 - €,t0- e}, (2) 
Mic((s # t) - 0) = {50 - €, t0 - €, s0 - €, t0 - €}, (3) 


and let Mec be defined as follows, depending on whether the clause is unit or 
non-unit: 


Mec((s ~ t) - 0) = {{s- 9}, {t- OF}, (5) 
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Miec((s Z t)- 0) = {{s-6,t-0,50-6,t0-e}}, (6) 
Mec((s ¥tV-++)-0)={Mi-(L- 0) | Le (s&tv---)}, (7) 

then ><, is defined by 
C -ore Dp iff Mee(C - 0) te Mce(D - p). (8) 


The main purpose of this definition is twofold: (i) that when s0 >, t0 and u 
occurs in a clause D, then s0 <u or s C s89 = u implies (s ~ t) -0p <ce D- p, and 
(ii) that when C is a positive unit clause, D is not, s is the maximal subterm 
in C0 and t is the maximal subterm in Do, then s >, t implies C -0 xee D-0. 
These two properties enable unconditional rewrites via oriented unit equations 
on positive unit clauses to succeed whenever they would also succeed in unfailing 
completion [4], and rewrites on negative unit and non-unit clauses to always 
succeed. This will enable us to prove the correctness of the simplification rules 
presented in the following section. 


4 Redundancies 


In this section we present several redundancy criteria for the superposition cal- 
culus and prove their completeness. Recall the definitions in [11]: a clause C 
is redundant in a set S if all its ground closures C - 0 follow from closures in 
GClos(S) which are smaller wrt. >ec; an inference C1,...,C, | D is redundant 
in a set S if, for all 6 € GSubs(Ci,...,Cn, D) such that C)6,...,C,0 H Dé is 
a valid inference, the closure D - 0 follows from closures in GClos($}) such that 
each is smaller than some C,-6,...,C,-6. These definitions (in terms of ground 
closures rather than in terms of ground clauses, as in [19]) arise because they 
enable us to justify stronger redundancy criteria for application in superposition 
theorem provers, including the AC criteria developed in [11] and the criteria in 
this section. 


Theorem 1. The superposition calculus [19] is refutationally complete wrt. clo- 
sure redundancy, that is, if a set of clauses is saturated up to closure redundancy 
(meaning any inference with non-redundant premises in the set is redundant) 
and does not contain the empty clause, then it is satisfiable. 


Proof. The proof of completeness of the superposition calculus wrt. this closure 
ordering carries over from [11] with some modifications, which are presented in 
a full version of this paper [12]. 


4.1 Encompassment Demodulation 
We introduce the following definition, to be re-used throughout the paper. 


Definition 1. A rewrite via l ~ r in clause C[l6] is admissible if one of the 
following conditions holds: (i) C is not a positive unit, or (let C = s[l0| ~ t for 
some @) (ii) 10 Æ s, or (iii) 10 31, or (iv) s <: t, or (v) rô =; t.! 


1 We note that (iv) is superfluous, but we include it since in practice it is easier to 
check, as it is local to the clause being rewritten and therefore needs to be checked 
only once, while (v) needs to be checked with each demodulation attempt. 
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We then have 


Encompassment xr CHOT where 10 >; r0, and 9 
Demodulation Cll04 rð] ° rewrite vial æ r in C is admissible. (9) 

In other words, given an equation l ~ r, if an instance l0 is a subterm in 
C, then the rewrite is admissible (meaning, for example, that an unconditional 
rewrite is allowed when l0 >; r0) if C is not a positive unit, or if 10 occurs at 
a strict subterm position, or if l0 is less general than l, or if l0 occurs outside 
a maximal side, or if r@ is smaller than the other side. This restriction is much 
weaker than the one given for the usual demodulation rule in superposition [17], 
and equivalent to the one in equational completion when we restrict ourselves 
to unit equalities [4]. 


Example 1. If f(x) > s, we can use f(x) ~ s to rewrite f(x) ~ t when s <: t, 
and f(a) ~t, f(x) #t, or f(x) ~ tV C regardless of how s and t compare. 


4.2 General Ground Joinability 


In [11] we developed redundancy criteria for the theory of AC functions in the 
superposition calculus. In this section we extend these techniques to develop 
redundancy criteria for ground joinability in arbitrary equational theories. 


Definition 2. Two terms are strongly joinable (s $ t), in a clause C wrt. a set 
of equations S, if either s = t, or s > s|l1cı > rıcı] $ t via rules |; © r; € S, 
where the rewrite via lı œ% rı is admissible in C, or s > s|hoi > rıcı] 4 
t|l202 > r202] 4 t via rules l; ~ r; € S, where the rewrites via h ~ rı and 
l2 © rg are admissible in C. To make the ordering explicit, we may write s $, t. 
Two terms are strongly ground joinable (s § t), in a clause C wrt. a set of 
equations S, if for all 0 € GSubs(s,t) we have s0 $ t0 in C wrt. S. 


We then have: 


bes use srHIVC S ; 
Ground joinability ————————.,,_ where s$tinsxtVC wrt. S, (10a) 


seived S 


Ground joinability , wheres$tins#tVC wrt. S. (10b) 


Theorem 2. Ground joinability is a sound and admissible redundancy criterion 
of the superposition calculus wrt. closure redundancy. 


Proof. We will show the positive case first. If s ¢ t, then for any instance (s ~ 
t V C) -8 we either have s = t0, and therefore Ø — (s ~ t) - 8, or we have wlog. 
sO >, t0, with s0 | t0. Then sé and t0 can be rewritten to the same normal form 
u by lioi — rio; where l; ~ ri E€ S. Since u <+ s0 and u =+ tô, then (s ~ tV C)-0 
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follows from smaller (u ~ u V C) - 6? (a tautology, i.e. follows from ) and from 
the instances of clauses in S used to rewrite s0 > u + t0. It only remains to 
show that these latter instances are also smaller than (s ~ t V C) - 6. Since we 
have assumed s0 >; t0, then at least one rewrite step must be done on s0. Let 
1101 > 1101 be the instance of the rule used for that step, with (l ~ r1)-o1 the 
closure that generates it. By Definition 1 and 2, one of the following holds: 


—~ C Æ Ú, therefore (l1 © 11) +01 ~ce (s ~ tV C) +4, or 

— l101<180, therefore la, <t s0 => l-i <te 8-0 => (l1 S ri) r Kee (s ~ t)-0, 
or 

— lo, = sO and s Dh, therefore lı -01 <te 8-6 => (l © r1) 0i sce (s & t)-0, 
or 

— lo. = s0 and s = lı and rio, <+ tô, therefore rı -or <t t-0 > (hL & 
rı): o1 <ce (s & t) - 0. 


As for the remaining steps, they are done on the smaller side t0 or on the other 
side after this first rewrite, which is smaller than s0. Therefore all subsequent 
steps done by any ljoj > rjo; will have r; +0; <te lj: 0; <t 8-0 > (G & 
Tj) + Oj; ~ce (s ~% tV C)- 6. As such, since this holds for all ground closures 
(s ~ tV C)-0, then s ~ tV C is redundant wrt. S. 

For the negative case, the proof is similar. We will conclude that (s % tVC)-0 
follows from smaller (l; ~ ri) -oi € GClos( S) and smaller (u % u V C) - 0. The 
latter, of course, follows from smaller C - 0, therefore s Æ% tV C is redundant wrt. 
SU{C}. 


Example 2. If S = {f(x,y) ~ f(y,x)}, then f(a, f(y,z)) = f(x, f(z,y)) is 
redundant wrt. S. Note that f(x,y) ~ f(y,x) is not orientable by any sim- 
plification ordering, therefore this cannot be justified by demodulation alone. 


Testing for Ground Joinability. The general criterion presented above begs 
the question of how to test, in practice, whether s } t in a clause s¥tVC. Several 
such algorithms have been proposed [1,18,21]. All of these are based on the 
observation that if we consider all total preorders >, on Vars(s,t) and for all of 
them show strong joinability with a modified ordering—which we denote > ;,,j— 
then we have shown strong ground joinability in the order >, [18]. 


Definition 3. A simplification order on terms >; extended with a preorder on 
variables =,, denoted ~;,,), is a simplification preorder (i.e. satisfies all the 
relevant properties in Sect.2) such that ~,j,) 2 >: U =». 


Example 3. If £ =, y, then g(x) =t] 9(y), 9(@) =t] Y, f(a.) >t] fly, 2), 
etc. 


The simplest algorithm based on this approach would be to enumerate all 
possible total preorders >, over Vars(s,t), and exhaustively reduce both sides 


2 Wlog. uf = u, renaming variables in u if necessary. 
? 
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via equations in S orientable by >¢,,), checking if the terms can be reduced to the 
same normal form for all total preorders. This is very inefficient since there are 
O(n!le”) such total preorders [7], where n is the cardinality of Vars(s,t). Another 
approach is to consider only a smaller number of partial preorders, based on the 
obvious fact that s Se t > Vel D AS $sin t, so that joinability under 
a smaller number of partial preorders can imply joinability under all the total 
preorders, necessary to prove ground joinability. 

However, this poses the question of how to choose which partial preorders to 
check. Intuitively, for performance, we would like that whenever the two terms 
are not ground joinable, that some total preorder where they are not joinable is 
found as early as possible, and that whenever the two terms are joinable, that 
all total preorders are covered in as few partial preorders as possible. 


Example 4. Let S = {f (x, f(y, z))=f (y, f(z, z))}. Then f(x, f(y, f(z, f(w,u)))) 
x f(x, f(y, f(w, f(z,u)))) can be shown to be ground joinable wrt. S by checking 


just three cases: =, E€ {z>w , z~w , z<w}, even though there are 6942 possible 
preorders. 


Waldmeister first tries all partial preorders relating two variables among 
Vars(s,t), then three, etc. until success, failure (by trying a total order and fail- 
ing to join) or reaching a predefined limit of attempts [1]. Twee tries an arbitrary 
total strict order, then tries to weaken it, and repeats until all total preorders are 
covered [21]. We propose a novel algorithm—incremental ground joinability— 
whose main improvement is guiding the process of picking which preorders to 
check by finding, during the process of searching for rewrites on subterms of the 
terms we are attempting to join, minimal extensions of the term order with a 
variable preorder which allow the rewrite to be done in the > direction. 

Our algorithm is summarised as follows. We start with an empty queue of 
variable preorders, V, initially containing only the empty preorder. Then, while 
V is not empty, we pop a preorder >, from the queue, and attempt to perform 
a rewrite via an equation which is newly orientable by some extension ~/, of =». 
That is, during the process of finding generalisations of a subterm of s or t among 
left-hand sides of candidate unoriented unit equations l © r, when we check that 
the instance 10 ~ r0 used to rewrite is oriented, we try to force this to be true 
under some minimal extension >¢j,.) of >ti], if possible. If no such rewrite exists, 
the two terms are not strongly joinable under >z] or any extension, and so are 
not strongly ground joinable and we are done. If it exists, we exhaustively rewrite 
with >=:w], and check if we obtain the same normal form. If we do not obtain 
it yet, we repeat the process of searching rewrites via equations orientable by 
further extensions of the preorder. But if we do, then we have proven joinability 
in the extended preorder; now we must add back to the queue a set of preorders 
O such that all the total preorders which are D >, (popped from the queue) 
but not D >’, (minimal extension under which we have proven joinability) are 
D of some =” € O (pushed back into the queue to be checked). Obtaining this 
O is implemented by order_diff(~,, =',), defined below. Whenever there are no 
more preorders in the queue to check, then we have checked that the terms are 
strongly joinable under all possible total preorders, and we are done. 


Ground Joinability and Connectedness in the Superposition Calculus 177 


Together with this, some book-keeping for keeping track of completeness 
conditions is necessary. We know that for completeness to be guaranteed, the 
conditions in Definition 1 must hold. They automatically do if C is not a positive 
unit or if the rewrite happens on a strict subterm. We also know that after a 
term has been rewritten at least once, rewrites on that side are always complete 
(since it was rewritten to a smaller term). Therefore we store in the queue, 
together with the preorder, a flag in P({L,R}) indicating on which sides does a 
top rewrite need to be checked for completeness. Initially the flag is {L} if s >, t, 
{R} if s <: t, {L,R} if s and t are incomparable, and {} if the clause is not a 
positive unit. When a rewrite at the top is attempted (say, l ~ r used to rewrite 
s = l0 with t being the other side), if the flag for that side is set, then we check if 
l0 J lor r0 < t. If this fails, the rewrite is rejected. Whenever a side is rewritten 
(at any position), the flag for that side is cleared. 

The definition of order_diff is as follows. Let the transitive reduction of > be 
represented by a set of links of the form x>y / z~y. 


order_diff(=1, +2) = {=*| > € order_diff’(=1, =. )}, (11a) 
order_diff’(=1,=5) = (11b) 
r> y > order-diff'(>1, 53 ) 
=z = {z>} u= > {z1 U {y>}, =1 U {any} 
ryiy => ee B 
U order_diff’(=1 U {a>y}, = ) 
r~ y> order-diff'(>1, 53 ) 
zeja > | {=1U {x>y} , =1U {y>x}} 
U order diff’ (= U {2~y}, =5 ) 
0. 


TL Y > 
==% > 
where >1 C %9. In other words, we take a transitive reduction of ~2, and for 


all links £ in that reduction which are not part of — 1, we return orders ~1 
augmented with the reverse of £ and recurse with =, = =, U Z. 


Example 5. 

=ı = order-_diff(=1, =2) 

ne a E Y CO NE cay OO ee an E E EYEE E E E O 
yxr>z|r=y>z sige Bede Se a 


Theorem 3. For all total a > 1, there exists one and only one >; E€ {z2} U 
order_diff(=1,*2) such that +7 D >. For all =} Z 1, there is no >; € 
{>2} U order_diff(=1,=2) such that >f D >;. 
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Proof. See full version of the paper [12]. 


An algorithm based on searching for rewrites in minimal extensions of a 
variable preorder (starting with minimal extensions of the bare term ordering, 
~+0]), has several advantages. The main benefit of this approach is that, instead 
of imposing an a priori ordering on variables and then checking joinability under 
that ordering, we instead build a minimal ordering while searching for candidate 
unit equations to rewrite subterms of s,t. For instance, if two terms are not 
ground joinable, or not even rewritable in any >+] where it was not rewritable in 
> +, then an approach such as the one used in Avenhaus, Hillenbrand and Léchner 
[1] cannot detect this until it has extended the preorder arbitrarily to a total 
ordering, while our incremental algorithm immediately realises this. We should 
note that empirically this is what happens in most cases: most of the literals we 
check during a run are not ground joinable, so for practical performance it is 
essential to optimise this case. 


Theorem 4. Algorithm 1 returns “Success” only if s § t in C wrt. $.° 


Proof. We will show that Algorithm 1 returns “Success” if and only if s ds ri t 


for all total =7 over Vars(s,t), which implies s ht. 

When (>,, s, t, c) is popped from V, we exhaustively reduce s, t via equations 
in S oriented wrt. >i], obtaining s",t". If s” ~io] t”, then s Seay t, and so 
s $= eT] t for all total >F D >,. If s” %tiw] t, we will attempt to rewrite one 
of s”, t” using some extended >t] where =, D ~y. If this is impossible, then 
s be sian t for any >’, D >,, and therefore there exists at least one total >T such 
that s f, 7 t, and we return “Fail”. 

If this is possible, then we repeat the process: we exhaustively reduce wrt. 
=w], obtaining s’, t. If s” %tiw t, then we start again the process from the step 
where we attempt to rewrite via an extension of ~/,: we either find a rewrite with 
some > yy] with >% D >i, and exhaustively normalise wrt. +4”) obtaining 
st", etc., or we fail to do so and return “Fail”. 

If in any such step (after exhaustively normalising wrt. =+w]) we find s’ ~tt] 
t’, then s Sethu t, and so s Ye ery t for all total >? D >/,. Now at this point 


we must add back to the queue a set of preorders ~"’; such that: for all total 
>T D >,„, either >T D >’, (proven to be $) or =7 D some >”; (added to V 
to be checked). For efficiency, we would also like for there to be no overlap: no 
total >T D >, is an extension of more than one of {=/,,>="4,...}. 

This is true because of Theorem 3. So we add {(=",,8",t",c") | =", € 
order_diff(=,, =/,)} to V, where c” = c \ (if s” # s then {L} else {}) \(ift” F 
t then {R} else {}). Note also that s i s” and t oo t”, therefore also 


r T 4 FE: 
8 $e se) s” and t Fiai Vif 43 Deu 


3 Note that the other direction may not always hold, there are strongly ground joinable 
terms which are not detected by this method of analysing all preorders between 


variables, e.g. f(x, 9(y)) 8 f(g(y), x) wrt. S={f(x,y) ~ f(y, £)}. 
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Algorithm 1: Incremental ground joinability test 
Input: literal s % t € C; set of unorientable equations S$ 
Output: whether s § t in C wrt. S 
begin 
c+ Q if C is not pos. unit, {L} if s > t, {R} if s < t, {L,R} otherwise 
V + {(0,s,t,c)} 
while V is not empty do 
(=v, s8, t, c) < pop from V 
s,t normalise s,t wrt. >+/,;, with completeness flag c 
c«c\ ({L} if s was changed) \ ({R} if t was changed) 
if s ~ii] t then 
continue 
else 
S td <—s,t,c 
while there exists 1 ~ r € S that can rewrite s’ or t’ wrt. some 
>=, D =v, with completeness flag c do 
s’,t/ < normalise s‘,¢’ wrt. >4,,.), with completeness flag c 
c +c’ \ ({L} if s’ was changed) \ ({R} if t' was changed) 
if s’ ~t[v'] t' then 
for >” in order_diff(=,,=!,) do push (=, s,t,c) to V 
break 
end 
are ata 
else 
return Fail 
end 


end 

else 

return Success 
end 


end 
where rewriting u in s,t wrt. > with completeness flag c succeeds if 
(i) u is a strict subterm of s or t, 

(ii) u = s with L ¢ c, 

(iii) u = t with R ¢ c, 

(iv) instance lo ~ ro used to rewrite has l u, 
( 

( 


v) u = s with ro Xt, 
vi) or u = t with ro < s. 


end 


During this whole process, any rewrites must pass a completeness test men- 
tioned previously, such that the conditions in the definition of ¢ hold. Let so, to 
be the original terms and s,t be the ones being rewritten and c the completeness 
flag. If the rewrite is at a strict subterm position, it succeeds by Definition 2. 
If the rewrite is at the top, then we check c. If L is unset (L ¢ c), then either 
s = So < tp or s < So or the clause is not a positive unit, so we allow a rewrite 
at the top of s, again by Definition 2. If L is set (L € c), then an explicit check 
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must be done: we allow a rewrite at the top of s (= so) iff it is done by la > ra 
with lo 3 l or ro < to. Respectively for R, with the roles of s and t swapped. 
In short, we have shown that if (=,, s’, t’, c’) is popped from V, then V is only 


ever empty, and so the algorithm only terminates with “Success”, if s’ Ye ery t 


for all total =? D >,. Since V is initialised with (Ø, s,t,c), then the algorithm 
only returns “Success” if s $, a t for all total >T. 
tlv 


Orienting via Extension of Variable Ordering. In order to apply the 
ground joinability algorithm we need a way to check, for a given >; and >, 
and some s,t, whether there exists a =’, D =, such that s =+] t. Here we show 
how to do this when >, is a Knuth-Bendix Ordering (KBO) [15]. 

Recall the definition of KBO. Let >, be a partial order on symbols, w be 
an N-valued weight function on symbols and variables, with the property that 
dm Va € V. w(x) = m, w(c) > m for all constants c, and there may only exist 
one unary symbol f with w(f) = 0 and in this case f >, g for all other symbols 
g. For terms, their weight is w(f(s1,...)) = w(f) +w(si)+---. Let also |s|, be 
the number of occurrences of x in s. Then 


either w(f(si,...)) > w(g(t1,---)), 


or w(f(si,---)) = w(g(ti,---)) 
and f >; 9g, 
f(s1,---) >xpo g(ti,...) iff or w(f(si,---)) = w(g(t1,---)) (12a) 
and f =g, 
and S1,--- KBOlex f1,--+3 
and Vz E V. |f (Xle > lg-le 
f(s1,...) >KBo T if |f(s1,.-.)la > 1. (12b) 
£z =KBO Y if L. (12c) 


The conditions on variable occurrences ensure that s >Kgo t => V0. s0 >xKBo tô. 

When we extend the order >xgo with a variable preorder ~,, the starting 
point is that £ >» y > © >xKpop] Y and z ~w Y > © ~KBO] Y- Then, to ensure 
that all the properties of a simplification order (included the one mentioned 
above) hold, we arrive at the following definition (similar to [1]). 


either w(f(...)) > w(g(.--)), 
or w(f(s1,.-.)) = w(g(t1,---)) 


and f =s g, 
ag Jor wln ..-)) =w(g(ts,---)) 
F(s1,---) pon gti) if and f r g, : (13a) 
and oe ~KBOlv tex a rae 


and Yz € V. diye FC. ly 
Se anh 
f(si, X .) —KBO[v] x iff dy my be | f(si, bax Vy >1. (13b) 
© >KBO[v] Y iff @>yy. (13c) 
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To check whether there exists a >/, D =, such that s >Kpojv t, we need 
to check whether there are some x>y or x= y relations that we can add to =, 
such that all the conditions above hold (and such that it still remains a valid 
preorder). Let us denote “there exists a >’, D =, such that s >KBO|w’] t” by 
S >KBO|v,v’] t. Then the definition is 


either w(f(...)) > w(g(.--)), 
or w(f(si,.-.)) = w(g(hh,.--)) 


and f >s 9, 
ot w(fler,.-.)) =w) 
and f =g, 
f(s1,.--) >KBO|[v,v’] g(ti,...) iff and hy sce Swati ieee st (14a) 


and 471, y1,... 
=, = (=v U { (£1, y1), ---})" is a preorder 
such that YrEV. Lu alf Cly 


A i: 
TE ly 21, 


f(si, T g) ™=KBO[v,v’] £ iff i with = = Py VU {x>y} (14b) 
or +) = =v U {r=y}. 


: T Avy 
E> KBOlv,v'] Y 7 { with >, = hyo}. oa 


This check can be used in Algorithm 1 for finding extensions of variable order- 
ings that orient rewrite rules allowing required normalisations. 


4.3 Connectedness 


Testing for joinability (i.e. demodulating to s ~ s or s # s) and ground joinability 
(presented in the previous section) require that each step in proving them is done 
via an oriented instance of an equation in the set. However, we can weaken this 
restriction, if we also change the notion of redundancy being used. 

As criteria for redundancy of a clause, finding either joinability or ground 
joinability of a literal in the clause means that the clause can be deleted or the 
literal removed from the clause (in case of a positive or negative literal, resp.) 
in any context, that is, we can for example add them to a set of deleted clauses, 
and for any new clause, if it appears in that set, then immediately remove it 
since we already saw that it is redundant. The criterion of connectedness [3,21], 
however, is a criterion for redundancy of inferences. This means that a conclusion 
simplified by this criterion can be deleted (or rather, not added), but in that 
context only; if it ever comes up again as a conclusion of a different inference, 
then it is not necessarily also redundant. Connectedness was introduced in the 
context of equational completion, here we extend it to general clauses and show 
that it is a redundancy in the superposition calculus. 


Definition 4. Terms s and t are connected under clauses U and uni- 
fier p wrt. a set of equations S if there exist terms v1,...,Un, equations 
l &11,...,ln—-1 © Tn—1, and substitutions o1,...,0n—1 such that: 
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(i) vı = s and vn = t, 
(ii) for all i € 1, derg tim 1, either Vi41 = villici > rigi] or v; = vi4i (lio; H ricil, 
with 1; sr; € S, 
(iii) for all i € 1,...,n—1, there exists w in Ucey Upzgec {P q}* such that for 
ui E€ {liri}, either (a) uisi < wp, or (b) uio; = wp and either u; C w or 
w € C such that C is not a positive unit. 


Theorem 5. Superposition inferences of the form 


IxrVC sļluxtvp Where p= mgu(l, u), 
, lp% rp, sp ftp, (15) 
and u not a variable, 


(slur r] = tVCV D)p 


where s[u> r]p and tp are connected under {1 ~ r V C, s ~ tV D} and unifier 
p wrt. some set of clauses S, are redundant inferences wrt. S. 


Proof. Let us denote s’ = s|u > r]. Let also U = {l ~ rv C, s ~ tV D} and 
M = Uceu Upegec {P4}. We will show that if s’p and tp are connected under 
U and p, by equations in S, then every instance of that inference obeys the 
condition for closure redundancy of an inference (see, Sect. 4), wrt. S. 

Consider any (s’ ~ t VC V D)p- 0 where 0 € GSubs(Up). Either s'p6 = tp6, 
and we are done (it follows from Ø), or s/p9 > tp@, or s'p < tpé. 

Consider the case s‘o > tp@. For alli € 1,...,n—1, there exists a C” € U and 
a w € C’ such that either (iii.a) l;o;0 < wp, or (iii-b) l;0;0 = wp@ and l; E v, 
or (iii-b) l;o;0 = wp and C” is not a positive unit. Likewise for r;. Therefore, 
for alli € 1,...,n — 1, there exists a C” € U such that (l;i © ri) -a;0 < C” - pO. 
Since (t ~ t V---)p-@ is also smaller than (s’ ~ tV---)p- 0 and a tautology, 
then the instance (s’ ~ t V---)p-@ of the conclusion follows from closures in 
GClos( S) such that each is smaller than one of (l ~ r V C) - p0, (s = tV D) - p8. 

In the case that s'pð < tp, the same idea applies, but now it is (s’ ~ 
s' V ---)p- 0 which is smaller than (s’ ~ tV ---)p- 0 and is a tautology. 

Therefore, we have shown that for all 6 € GSubs((l ~ rv C)p, (s ~ tV D)p), 
the instance (s’ ~ tV- -- )p-0 of the conclusion follows from closures in GClos(S) 
which are all smaller than one of (l ~ r V C) - pO, (s ~ tV D)- p8. Since 
any valid superposition inference with ground clauses has to have | = u, then 
any 6’ € GSubs(l x rv ©, s~ tV D, (f ~x tVCV D)p) such that the 
inference (l ~ rV C)0',(s = tV D)? þh (8) ~ tv Cv D)p® is valid must 
have 6’ = p9”, since p is the most general unifier. Therefore, we have shown 
that for all 6’ € GSubs(l ~ rv C, s~ tV D, (s ~ tV CV D)p) for which 
(lx rv OO, (sx tV DO H (s' & tVCV D)o is a valid superposition 
inference, the instance (s’ ~ t V---)p- 6" of the conclusion follows from closures 
in GClos(S) which are all smaller than one of (l ~ rVC)-6’, (s = tV D)-6', so 
the inference is redundant. 


4 That is, in the set of top-level terms of literals of clauses in U. 
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Theorem 6. Superposition inferences of the form 


lxrVC slu] #tvVD Where p= mgu(l, u), 


> lp £ rp, sp ftp, (16) 
(slu=>r]#tv Cv D)e andu nota variable, 


where s[u > r]p and tp are connected under {1 ~ rV C, s # tV D} and unifier 
p wrt. some set of clauses S, are redundant inferences wrt. S U {(C v D)p}. 


Proof. Analogously to the previous proof, we find that for all instances of the 
inference, the closure (s’ % tV---)p-6 follows from smaller closure (t # tV---)p-6 
or (s’ % s’V---)p-6@ and closures (l; ~ ri): 0:0 smaller than max{(l ~ rVC)-0, 
(s €tVD)-6, (s' €tVCVD)p-6}. But (t % tVCVD)p-6 and (s’ % s' VCVD)p-0 
both follow from smaller (C V D)p- 0, therefore the inference is redundant wrt. 
SU{(CV D)p}. 


4.4 Ground Connectedness 


Just as joinability can be generalised to ground joinability, so can connectedness 
be generalised to ground connectedness. Two terms s,t are ground connected 
under U and p wrt. S if, for all 0 € GSubs(s, t), s0 and ¢@ are connected under 
D and p wrt. S. Analogously to strong ground joinability, we have that if s and t 
are connected using >+] for all total =, over Vars(s, t), then s and t are ground 
connected. 


Theorem 7. Superposition inferences of the form 


lerVvC s{uj tv D where p = mgu(/, u), 
, lp frp, sp ftp, (17) 
and u not a variable, 


(sju r] xtv Cv D)p 
where s[u> r]p and tp are ground connected under {1 ~ r V C, s ~ tV D} and 
unifier p wrt. some set of clauses S, are redundant inferences wrt. S. 
Theorem 8. Superposition inferences of the form 

xrvCO su] #tVD where p = mgu(?, u), 


, lp £ rp, sp ftp, (18) 
(slu=>r]#tv Cv D)e andu nota variable, 


where s[u > r]p and tp are ground connected under {1 ~ rV C, s # tv D} and 
unifier p wrt. some set of clauses S, are redundant inferences wrt. SU{(C V D)p}. 


Proof. The proof of Theorem 7 and 8 is analogous to that of Theorem 5 and 6. 
The weakening of connectedness to ground connectedness only means that the 
proof of connectedness (e.g. the vi, li © ri, oci) may be different for different 
ground instances. For all the steps in the proof to hold we only need that for all 
the instances 0 € GSubs(l ~ rV C ,s%tV D , (s[u> r] &tV CV D)p) of the 
inference, 0 = o0’ with o € GSubs(s[u > r]p, tp), which is true. 


Discussion about the strategy for implementation of connectedness and ground 
connectedness is outside the scope of this paper. 
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5 Evaluation 


We implemented ground joinability in a theorem prover for first-order logic, 
iProver [10,16].° iProver combines superposition, Inst-Gen, and resolution cal- 
culi. For superposition, iProver implements a range of simplifications including 
encompassment demodulation, AC normalisation [10], light normalisation [16], 
subsumption and subsumption resolution. We run our experiments over FOF 
problems of the TPTP v7.5 library [23] (17348 problems) on a cluster of Linux 
servers with 3 GHz 11 core CPUs, 128 GB memory, with each problem running on 
a single core with a time limit of 300s. We used a default strategy (which has not 
yet been fine-tuned after the introduction of ground joinability), with superpo- 
sition enabled and the rest of the components disabled. With ground joinability 
enabled, iProver solved 133 problems more which it did not solve without ground 
joinability. Note that this excludes the contribution of AC ground joinability or 
encompassment demodulation [11] (always enabled). 

Some of the problems are not interesting for this analysis because ground 
joinability is not even tried, either because they are solved before superposition 
saturation begins, or because they are ground. If we exclude these, we are left 
with 10005 problems. Ground joinability is successfully used to eliminate clauses 
in 3057 of them (30.6%, Fig. la). This indicates that ground joinability is useful 
in many classes of problems, including in non-unit problems where it previously 
had never been used. 


7000 i i i i j : j 6000 
6000 + J L + 5000 

2 5000 F 4 

g L 4+ 4000 

© 4000 + 4 

Ri L + 3000 

z 3000 b 4 

$ 2000 b 4o jee 
1000 | J F + 1000 

0 0 


0 0-10 10-107 107-10* 103-104 0-0.1% 0.1-1% 1-10% 10-20% >20% 


(a) (b) 


Fig. 1. (a) Clauses simplified by ground joinability. (b) % of runtime spent in gr. 
joinability 


In terms of the performance impact of enabling ground joinability, we mea- 
sure that among problems whose runtime exceeds 1s, only in 72 out of 8574 
problems does the time spent inside the ground joinability algorithm exceed 20% 
of runtime, indicating that our incremental algorithm is efficient and suitable for 
broad application (Fig. 1b). 


5 iProver is available at http: //www.cs.man.ac.uk/~korovink/iprover. 
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TPTP classifies problems by rating in [0,1]. Problems with rating >0.9 are 
considered to be very challenging. Problems with rating 1.0 have never been 
solved by any automated theorem prover. iProver using ground joinability solves 
3 previously unsolved rating 1.0 problems, and 7 further problems with rating in 
(0.9,1.0[ (Table 1). We note that some of these latter (e.g. LAT140-1, ROBQ18-10, 
RELQ45-1) were previously only solved by UEQ or SMT provers, but not by any 
full first-order prover. 


Table 1. Hard or unsolved problems in TPTP, solved by iProver with ground joinabil- 
ity. 


Name Rating Name Rating 
LAT140-1 0.90 ROBQ18-18 0.95 
RELQ45-1 0.90 LCL477+1 0.97 
LCL557+1 0.92 LCL478+1 1.00 
LCL563+1 0.92 CSRO39+6 1.00 
LCL474+1 0.94 CSRO40+6 1.00 


6 Conclusion and Further Work 


In this work we extended the superposition calculus with ground joinability and 
connectedness, and proved that these rules preserve completness using a modified 
notion of redundancy, thus bringing for the first time these techniques for use in 
full first-order logic problems. We have also presented an algorithm for checking 
ground joinability which attempts to check as few variable preorders as possible. 

Preliminary results show three things: (1) ground joinability is applicable in 
a sizeable number of problems across different domains, including in non-unit 
problems (where it was never applied before), (2) our proposed algorithm for 
checking ground joinability is efficient, with over 3 of problems spending less 
than 1% of runtime there, and (3) application of ground joinability in the super- 
position calculus of iProver improves overall performance, including discovering 
solutions to hitherto unsolved problems. 

These results are promising, and further optimisations can be done. Imme- 
diate next steps include fine-tuning the implementation, namely adjusting the 
strategies and strategy combinations to make full use of ground joinability and 
connectedness. iProver uses a sophisticated heuristic system which has not yet 
been tuned for ground joinability and connectedness [14]. 

In terms of practical implementation of connectedness and ground connect- 
edness, further research is needed on the interplay between those (criteria for 
redundancy of inferences) and joinability and ground joinability (criteria for 
redundancy of clauses). 

On the theoretical level, recent work [24] provides a general framework for 
saturation theorem proving, and we will investigate how techniques developed 
in this paper can be incorporated into this framework. 
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Abstract. Abduction in description logics finds extensions of a knowl- 
edge base to make it entail an observation. As such, it can be used to 
explain why the observation does not follow, to repair incomplete knowl- 
edge bases, and to provide possible explanations for unexpected observa- 
tions. We consider TBox abduction in the lightweight description logic 
EL, where the observation is a concept inclusion and the background 
knowledge is a TBox, i.e., a set of concept inclusions. To avoid useless 
answers, such problems usually come with further restrictions on the 
solution space and/or minimality criteria that help sort the chaff from 
the grain. We argue that existing minimality notions are insufficient, and 
introduce connection minimality. This criterion follows Occam’s razor by 
rejecting hypotheses that use concept inclusions unrelated to the problem 
at hand. We show how to compute a special class of connection-minimal 
hypotheses in a sound and complete way. Our technique is based on a 
translation to first-order logic, and constructs hypotheses based on prime 
implicates. We evaluate a prototype implementation of our approach on 
ontologies from the medical domain. 


1 Introduction 


Ontologies are used in areas like biomedicine or the semantic web to represent 
and reason about terminological knowledge. They consist normally of a set of 
axioms formulated in a description logic (DL), giving definitions of concepts, or 
stating relations between them. In the lightweight description logic EL [2], par- 
ticularly used in the biomedical domain, we find ontologies that contain around 
a hundred thousand axioms. For instance, SNOMED CT! contains over 350,000 
axioms, and the Gene Ontology GO? defines over 50,000 concepts. A central 


1 https: //www.snomed.org/. 
? http: //geneontology.org/. 
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reasoning task for ontologies is to determine whether one concept is subsumed 
by another, a question that can be answered in polynomial time [1], and rather 
efficiently in practice using highly optimized description logic reasoners [29]. If 
the answer to this question is unexpected or hints at an error, a natural inter- 
est is in an explanation for that answer—especially if the ontology is complex. 
But whereas explaining entailments—i.e., explaining why a concept subsump- 
tion holds—is well-researched in the DL literature and integrated into standard 
ontology editors [21,22], the problem of explaining non-entailments has received 
less attention, and there is no standard tool support. Classical approaches involve 
counter-examples [5], or abduction. 

In abduction a non-entailment T jÆ a, for a TBox T and an observation a, is 
explained by providing a “missing piece”, the hypothesis, that, when added to the 
ontology, would entail a. Thus it provides possible fixes in case the entailment 
should hold. In the DL context, depending on the shape of the observation, one 
distinguishes between concept abduction [6], ABox abduction [7—10, 12,19, 24, 
25,30,31], TBox abduction [11,33] or knowledge base abduction [14,26]. We are 
focusing here on TBox abduction, where the ontology and hypothesis are TBoxes 
and the observation is a concept inclusion (CI), i.e., a single TBox axiom. 

To illustrate this problem, consider the following TBox, about academia, 


Ta = { demployment.ResearchPosition M dqualification. Diploma E Researcher, 


dwrites.ResearchPaper E Researcher, Doctor C dqualification.PhD, 


Professor = Doctor M demployment. Chair, 


FundsProvider E Swrites.GrantApplication } 


that states, in natural language: 


e “Being employed in a research position and having a qualifying diploma 
implies being a researcher.” 

“Writing a research paper implies being a researcher.” 

“Being a doctor implies holding a PhD qualification.” 

“Being a professor is being a doctor employed at a (university) chair.” 
“Being a funds provider implies writing grant applications.” 


The observation œa = Professor CE Researcher, “Being a professor implies being 
a researcher”, does not follow from 7, although it should. We can use TBox 
abduction to find different ways of recovering this entailment. 

Commonly, to avoid trivial answers, the user provides syntactic restrictions 
on hypotheses, such as a set of abducible axioms to pick from [8,30], a set 
of abducible predicates [25,26], or patterns on the shape of the solution [11]. 
But even with those restrictions in place, there may be many possible solutions 
and, to find the ones with the best explanatory potential, syntactic criteria 
are usually combined with minimality criteria such as subset minimality, size 
minimality, or semantic minimality [7]. Even combined, these minimality criteria 
still retain a major flaw. They allow for explanations that go against the principle 
of parsimony, also known as Occam’s razor, in that they may contain concepts 
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that are completely unrelated to the problem at hands. As an illustration, let us 
return to our academia example. The TBoxes 


Hai = { Chair E ResearchPosition, PhD E Diploma} and 
Hag = { Professor C FundsProvider, GrantApplication E ResearchPaper} 


are two hypotheses solving the TBox abduction problem involving 7, and a,. 
Both of them are subset-minimal, have the same size, and are incomparable w.r.t. 
the entailment relation, so that traditional minimality criteria cannot distinguish 
them. However, intuitively, the second hypothesis feels more arbitrary than the 
first. Looking at Hai, Chair and ResearchPosition occur in 7, in concept inclusions 
where the concepts in œ, also occur, and both PhD and Diploma are similarly 
related to a, but via the role qualification. In contrast, Haz involves the concepts 
FundsProvider and GrantApplication that are not related to aq in any way in 
Ta. In fact, any random concept inclusion A E Awrites.B in 7, would lead to 
a hypothesis similar to Haz where A replaces FundsProvider and B replaces 
GrantApplication. Such explanations are not parsimonious. 

We introduce a new minimality criterion called connection minimality that 
is parsimonious (Sect. 3), defined for the lightweight description logic EL. This 
criterion characterizes hypotheses for 7 and a that connect the left- and right- 
hand sides of the observation a without introducing spurious connections. To 
achieve this, every left-hand side of a CI in the hypothesis must follow from 
the left-hand side of a in 7, and, taken together, all the right-hand sides of the 
CIs in the hypothesis must imply the right-hand side of a in T, as is the case 
for Hai. To compute connection-minimal hypotheses in practice, we present a 
technique based on first-order reasoning that proceeds in three steps (Sect. 4). 
First, we translate the abduction problem into a first-order formula &. We then 
compute the prime implicates of ®, that is, a set of minimal logical consequences 
of @ that subsume all other consequences of ®. In the final step, we construct, 
based on those prime implicates, solutions to the original problem. We prove 
that all hypotheses generated in this way satisfy the connection minimality cri- 
terion, and that the method is complete for a relevant subclass of connection- 
minimal hypotheses. We use the SPASS theorem prover [34] as a restricted SOS- 
resolution [18,35] engine for the computation of prime implicates in a prototype 
implementation (Sect. 5), and we present an experimental analysis of its perfor- 
mances on a set of bio-medical ontologies. (Sect. 6). Our results indicate that our 
method can in many cases be applied in practice to compute connection-minimal 
hypotheses. A technical report companion of this paper includes all proofs as well 
as a detailed example of our method as appendices [16]. 

There are not many techniques that can handle TBox abduction in E£ or 
more expressive DLs [11,26,33]. In [11], instead of a set of abducibles, a set 
of justification patterns is given, in which the solutions have to fit. An arbi- 
trary oracle function is used to decide whether a solution is admissible or not 
(which may use abducibles, justification patterns, or something else), and it is 
shown that deciding the existence of hypotheses is tractable. However, different 
to our approach, they only consider atomic CIs in hypotheses, while we also 
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allow for hypotheses involving conjunction. The setting from [33] also considers 
EL, and abduction under various minimality notions such as subset minimality 
and size minimality. It presents practical algorithms, and an evaluation of an 
implementation for an always-true informativeness oracle (i.e., limited to sub- 
set minimality). Different to our approach, it uses an external DL reasoner to 
decide entailment relationships. In contrast, we present an approach that directly 
exploits first-order reasoning, and thus has the potential to be generalisable to 
more expressive DLs. 

While dedicated resolution calculi have been used before to solve abduction 
in DLs [9,26], to the best of our knowledge, the only work that relies on first- 
order reasoning for DL abduction is [24]. Similar to our approach, it uses SOS- 
resolution, but to perform ABox adbuction for the more expressive DL ALC. 
Apart from the different problem solved, in contrast to [24] we also provide a 
semantic characterization of the hypotheses generated by our method. We believe 
this characterization to be a major contribution of our paper. It provides an 
intuition of what parsimony is for this problem, independently of one’s ease with 
first-order logic calculi, which should facilitate the adoption of this minimality 
criterion by the DL community. Thanks to this characterization, our technique 
is calculus agnostic. Any method to compute prime implicates in first-order logic 
can be a basis for our abduction technique, without additional theoretical work, 
which is not the case for [24]. Thus, abduction in E£ can benefit from the latest 
advances in prime implicates generation in first-order logic. 


2 Preliminaries 


We first recall the descripton logic E£ and its translation to first-order logic [2], 
as well as TBox abduction in this logic. 

Let Nc and Ne be pair-wise disjoint, countably infinite sets of unary predi- 
cates called atomic concepts and of binary predicates called roles, respectively. 
Generally, we use letters A, B, E, F,... for atomic concepts, and r for roles, 
possibly annotated. Letters C, D, possibly annotated, denote EL concepts, built 
according to the syntax rule 


C:= T| Al CnC | arc. 


We implicitly represent EL conjunctions as sets, that is, without order, nested 
conjunctions, and multiple occurrences of a conjunct. We use [ ]{C,...,Cm} to 
abbreviate C1 N... Cm, and identify the empty conjunction (m = 0) with T. 
An EL TBox T is a finite set of concept inclusions (CIs) of the form C E D. 

EL is asyntactic variant of a fragment of first-order logic that uses Nc and Nr 
as predicates. Specifically, TBoxes 7 and CIs a correspond to closed first-order 
formulas 7(T) and z(a) resp., while concepts C correspond to open formulas 
m(C,x) with a free variable x. In particular, we have 
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m(T, a) := true, m(Sr.C, x) := 3y.(r(x, y) Aa(C,y)), 
(A, x) := A(x), mC E D) :=Va.(a(C,x) > t(D, x)), 
n(C N D, x) := r(C, x) AT(D, x), n(T):= Nila) |aeT}. 


As common, we often omit the A in conjunctions A @, that is, we identify sets 
of formulas with the conjunction over those. The notions of a term t; an atom 
P(t) where t is a sequence of terms; a positive literal P(t); a negative literal 
~P(t); and a clause, Horn, definite, positive or negative, are defined as usual for 
first-order logic, and so are entailment and satisfaction of first-order formulas. 

We identify CIs and TBoxes with their translation into first-order logic, and 
can thus speak of the entailment between formulas, CIs and TBoxes. When 
T = C E D for some T, we call C a subsumee of D and D a subsumer of C. 
We adhere here to the definition of the word “subsume”: “to include or contain 
something else”, although the terminology is reversed in first-order logic. We say 
two TBoxes 7i, 72 are equivalent, denoted Ti = Ty iff T, = h and D H Fi. For 
example {D E C4,..., D E Ch} = {D E CiN... NO Ch}. It is well known that, 
due to the absence of concept negation, every EL TBox is consistent. 

The abduction problem we are concerned with in this paper is the following: 


Definition 1. An EL TBox abduction problem (shortened to abduction prob- 
lem) is a tuple (T,£,Cı E C2), where T is a TBox called the background 
knowledge, & is a set of atomic concepts called the abducible signature, and 
Cı E Cə is a CI called the observation, s.t. T Æ Cı E Ca. A solution to this 
problem is a TBozx 


HC {40N A, E Bin- N Bm | {Ai,...,An, Bi,..-,Bm} C E} 


where m > 0, n > 0 and such that T U H = Cı E Co and, for all Cls a € H, 
T ‘ka. A solution to an abduction problem is called a hypothesis. 


For example, Hai and Haz are solutions for (7a, ©, œa), as long as © contains 
all the atomic concepts that occur in them. Note that in our setting, as in [6, 
33|, concept inclusions in a hypothesis are flat, i.e., they contain no existential 
role restrictions. While this restricts the solution space for a given problem, 
it is possible to bypass this limitation in a targeted way, by introducing fresh 
atomic concepts equivalent to a concept of interest. We exclude the consistency 
requirement 7 U H |- L, that is given in other definitions of DL abduction 
problem [25], since E£ TBoxes are always consistent. We also allow m > 1 instead 
of the usual m = 1. This produces the same hypotheses modulo equivalence. 

For simplicity, we assume in the following that the concepts C and C2 in the 
abduction problem are atomic. We can always introduce fresh atomic concepts 
A; and Ag with A, E Cı and C2 E Ap: to solve the problem for complex concepts. 

Common minimality criteria include subset minimality, size minimality and 
semantic minimality, that respectively favor H over H’ if: H G H’; the number 
of atomic concepts in H is smaller than in H’; and if H = H’ but H’ FH. 
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3 Connection-Minimal Abduction 


To address the lack of parsimony of common minimality criteria, illustrated 
in the academia example, we introduce connection minimality, Intuitively, con- 
nection minimality only accepts those hypotheses that ensure that every CI in 
the hypothesis is connected to both C1 and Cə in T, as is the case for Haj 
in the academia example. The definition of connection minimality is based on 
the following ideas: 1) Hypotheses for the abduction problem should create a 
connection between Cı and C2, which can be seen as a concept D that satisfies 
TUH = OC, E D, DE Cy. 2) To ensure parsimony, we want this connection 
to be based on concepts Dı and Də for which we already have T | C1 E Dy, 
Dz E C2. This prevents the introduction of unrelated concepts in the hypothe- 
sis. Note however that Dı and Dz can be complex, thus the connection from C4 
to Dı (resp. Dz to C2) can be established by arbitrarily long chains of concept 
inclusions. 3) We additionally want to make sure that the connecting concepts 
are not more complex than necessary, and that H only contains CIs that directly 
connect parts of Dz to parts of Dı by closely following their structure. 
To address point 1), we simply introduce connecting concepts formally. 


Definition 2. Let Cı and Cy be concepts. A concept D connects Ci to Co in T 
if and only if T = C1 E D and T E DE Cə. 


Note that if 7 = Cı E Cə then both C and Ch are connecting concepts from 
Cı to Cg, and if T j Cı E C2, the case of interest, neither of them are. 

To address point 2), we must capture how a hypothesis creates the connec- 
tion between the concepts Cı and C2. As argued above, this is established via 
concepts Dı and Də that satisfy 7 = Cı E Dı, Da E C2. Note that having 
only two concepts Dı and Dz is exactly what makes the approach parsimonious. 
If there was only one concept, C1 and C2 would already be connected, and as 
soon as there are more than two concepts, hypotheses start becoming more arbi- 
trary: for a very simple example with unrelated concepts, assume given a TBox 
that entails Lion E Felidae, Mammal C Animal and House C Building. A possible 
hypothesis to explain Lion E Animal is {Felidae E House, Building E Mammal} 
but this explanation is more arbitrary than {Felidae E Mammal}—as is the case 
when comparing Haz with Hai in the academia example—because of the lack of 
connection of House C Building with both Lion and Animal. Clearly this CI could 
be replaced by any other CI entailed by 7, which is what we want to avoid. 

We can represent the structure of Dı and Də in graphs by using EL descrip- 
tion trees, originally from Baader et al. [3]. 


Definition 3. An EL description tree is a finite labeled tree Z = (V,E, vo, 1) 
where V is a set of nodes with root vo € V, the nodes v € V are labeled with 
I(v) C Nc, and the (directed) edges vrw € E are such that v,w € V and are 
labeled with r € Np. 


Given a tree T = (V, E, vo, 1) and v € V, we denote by T(v) the subtree of T that 
is rooted in v. If l(vo) = {Ai,..., Ax} and v1, ..., Un are all the children of vp, we 
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Dei) 


employment qualification employment, qualification 


Chair PhD ResearchPosition Diploma 


Fig. 1. Description trees of Dı (left) and Də (right). 


can define the concept represented by F recursively using Czy = A1 N... 0 Ag O 
Ary.Cg(y,) 1... Ary.Cgiy,) where for j € {1,...,n}, vorjuj € E. Conversely, 
we can define {co for a concept C = An... 0N Ap 0 ary.Cy N... 1 ary.Cy 
inductively based on the pairwise disjoint description trees Tc, = {Vj, Ei, vi, li}, 
i € {1,...,n}. Specifically, Te = (Vc, Ec, vc, lc), where 


Ve = {vo} UU na Vis lo(v) = l;(v) for v € Vi, 
Ec = {vorivi | 1 < i < n} U Uii Ei, lo(vo) = {A1,..-, Ak}. 


If J = Ø, then subsumption between EL concepts is characterized by the 
existence of a homomorphism between the corresponding description trees [3]. 
We generalise this notion to also take the TBox into account. 


Definition 4. Let Zi = (Vi, E1, v0, l1) and T2 = (V2, E2, wo, l2) be two descrip- 
tion trees and T a TBox. A mapping @: V2 — V, is a T-homomorphism from 
Yo to Tı if and only if the following conditions are satisfied: 


1. (wo) = Vvo 
2. d(v)rd(w) € Ey for all vrw € E2 
3. for every v EV, and w € Vz with v = o(w), T [1h (w) Ef] b(w) 


If only 1 and 2 are satisfied, then ġ is called a weak homomorphism. 


T-homomorphisms for a given TBox T capture subsumption w.r.t. 7. If there 
exists a T-homomorphism ¢ from Tə to Tı, then 7 | Cg, E Cs,. This can 
be shown easily by structural induction using the definitions [16]. The weak 
homomorphism is the structure on which a T-homomorphism can be built by 
adding some hypothesis H to T. It is used to reveal missing links between a 
subsumee D2 of Cz and a subsumer D; of C1, that can be added using H. 


Example 5. Consider the concepts 


Di 
Də 


employment.Chair M Squalification.PhD 


employment.ResearchPosition N Jqualification.Diploma 


from the academia example. Figure 1 illustrates description trees for D, (left) 
and Də (right). The curved arrows show a weak homomorphism from Tp, to 
Tp, that can be strengthened into a 7-homomorphism for some TBox T that 
corresponds to the set of CIs in Haı U {T E T}. The figure can also be used to 
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illustrate what we mean by connection minimality: in order to create a connection 
between Dı and D2, we should only add the CIs from Hai U {T E T} unless 
they are already entailed by 7,. In practice, this means the weak homomorphism 
from Dz to Dı becomes a (Ta U Hai )-homomorphism. 


To address point 3), we define a partial order <p on concepts, s.t. C <n D if 
we can turn D into C by removing conjuncts in subexpressions, e.g., 47r’.B <n 
J3r.A n 3r’.(B nN B’). Formally, this is achieved by the following definition. 


Definition 6. Let C and D be arbitrary concepts. Then C <n D if either: 


e C=D, 
e D= D'N D”, and C <n D’, or 
e C = Jr.0"', D = 3r.D' and C' <n D'. 


We can finally capture our ideas on connection minimality formally. 


Definition 7 (Connection-Minimal Abduction). Given an abduction prob- 
lem (T, £,Cı E C2), a hypothesis H is connection-minimal if there exist concepts 
Dı and Dz built over & U Npr and a mapping > satisfying each of the following 
conditions: 

= Di, 


1TKEQC 

2. Də is a <n-minimal concept s.t. T = D E Co, 

3. ġ is a weak homomorphism from the tree Tp, = (V2, E2, w0, l2) to the tree 
Tp, = (Vi, Fi, v0, l1), and 

4. H={[Jh(ow)) Ef] aw) | w E V2 AT #flailw)) €[ aw). 


H is additionally called packed if the left-hand sides of the CIs in H cannot hold 
more conjuncts than they do, which is formally stated as: for H, there is no H' 
defined from the same Də and a D} and ¢' s.t. there is a node w € V2 for which 


(ew) SU (d'(w)) and h(o(w')) = 1 (9'(w)) for w # w. 


Straightforward consequences of Definition 7 include that ¢ is a (T U H)- 
homomorphism from Tp, to Tp, and that Dı and Də are connecting con- 
cepts from Cı to C2 in TUH so that TUH H Cı E Cy as wanted [16]. 
With the help of Fig. 1 and Example 5, one easily establishes that hypothe- 
sis Haı is connection-minimal—and even packed. Connection-minimality rejects 
Haz, as a single T’-homomorphism for some 7” between two concepts Dı and 
Də would be insufficient: we would need two weak homomorphisms, one link- 
ing Professor to FundsProvider and another linking Jwrites.GrantApplication to 
dwrites.ResearchPaper. 


4 Computing Connection-Minimal Hypotheses Using 
Prime Implicates 


To compute connection-minimal hypotheses in practice, we propose a method 
based on first-order prime implicates, that can be derived by resolution. We 
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fee aera »/ p } > 
ET “a | PE | (8) : 


Fig. 2. EL abduction using prime implicate generation in FOL. 


AERD cay 


PI generation recombination >/ S / 


A 


assume the reader is familiar with the basics of first-order resolution, and do not 
reintroduce notions of clauses, Skolemization and resolution inferences here (for 
details, see [4]). In our context, every term is built on variables, denoted 2, y, 
a single constant skp and unary Skolem functions usually denoted sk, possibly 
annotated. Prime implicates are defined as follows. 


Definition 8 (Prime Implicate). Let ® be a set of clauses. A clause y is an 
implicate of ® if B E p. Moreover p is prime if for any other implicate y' of & 
s.t. yp! | ọ, it also holds that y = gy’. 


Let £ C Nc be a set of unary predicates. Then PI (B) denotes the set of 
all positive ground prime implicates of ® that only use predicate symbols from 
EU NR, while PZZ (#) denotes the set of all negative ground prime implicates 
of & that only use predicates symbols from © U Ne. 


Example 9. Given a set of clauses ® = {A (sko), ~Bı (sko), 7Ai(x)Vr(a, sk(2)), 
~A (x) V Ao(sk(x)), 7Bo(a) V ar(xz,y) V 7Bs(y) V Bi(a)}, the ground prime 
implicates of ® for © = Nc are, on the positive side, PTZ (®) = {A; (sko), 
Ao(sk(sko)),r(sko, sk(sko))} and, on the negative side, PZY (@) = {~B (sko), 
—B2(sko) V 7B3(sk(sko))}. They are implicates because all of them are entailed 
by ®. For a ground implicate y, another ground implicate y’ such that y’ = y 
and y | y’ can only be obtained from y by dropping literals. Such an operation 
does not produce another implicate for any of the clauses presented above as 
belonging to PTZ*(®)and PTY (#), thus they really are all prime. 


To generate hypotheses, we translate the abduction problem into a set of first- 
order clauses, from which we can infer prime implicates that we then combine to 
obtain the result as illustrated in Fig. 2. In more details: We first translate the 
problem into a set & of Horn clauses. Prime implicates can be computed using an 
off-the-shelf tool [13,28] or, in our case, a slight extension of the resolution-based 
version of the SPASS theorem prover [34] using the set-of-support strategy and 
some added features described in Sect. 5. Since ® is Horn, PTZ" (®) contains 
only unit clauses. A final recombination step looks at the clauses in PIY (®) 
one after the other. These correspond to candidates for the connecting concepts 
Də of Definition 7. Recombination attempts to match each literal in one such 
clause with unit clauses from PTY (9). If such a match is possible, it produces a 
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suitable Dı to match D2, and allows the creation of a solution to the abduction 
problem. The set S contains all the hypotheses thus obtained. 

In what follows, we present our translation of abduction problems into first- 
order logic and formalize the construction of hypotheses from the prime impli- 
cates of this translation. We then show how to obtain termination for the prime 
implicate generation process with soundness and completeness guarantees on the 
solutions computed. 


Abduction Method. We assume the EL TBox in the input is in normal form 
as defined, e.g., by Baader et al. [2]. Thus every CI is of one of the following 
forms: 


ALB AnA C B Jr.AC B A dr.B 


where A, Ay, Ao, Be Nc U {T}. 
The use of normalization is justified by the following lemma. 


Lemma 10. For every EL TBox T, we can compute in polynomial time an EL 
TBox T' in normal form such that for every other TBoz H and every CIC E D 
that use only names occurring in T, we have TUH ECE DiffT'UH ECL D. 


After the normalisation, we eliminate occurrences of T, replacing this concept 
everywhere by the fresh atomic concept At. We furthermore add dr.A> E At 
and B EC Ay in T for every role r and atomic concept B occurring in 7. This 
simulates the semantics of T for At, namely the implicit property that C E 
T holds for any C no matter what the TBox is. In particular, this ensures 
that whenever there is a positive prime implicate B(t) or r(t,t’), At(t) also 
becomes a prime implicate. Note that normalisation and T elimination extend 
the signature, and thus potentially the solution space of the abduction problem. 
This is remedied by intersecting the set of abducible predicates X with the 
signature of the original input ontology. We assume that 7 is in normal form 
and without T in the rest of the paper. 

We denote by J~ the result of renaming all atomic concepts A in T using 
fresh duplicate symbols A~. This renaming is done only on concepts but not on 
roles, and on Cə but not on C4 in the observation. This ensures that the literals 
in a clause of PI¥ (®) all relate to the conjuncts of a <;-minimal subsumee of 
C2. Without it, some of these conjuncts would not appear in the negative impli- 
cates due to the presence of their positive counterparts as atoms in Pie (8). 
The translation of the abduction problem (T,4,C; E C2) is defined as the 
Skolemization of 


mT WT) Anm(C, E Cy) 
where sko is used as the unique fresh Skolem constant such that the Skolemiza- 
tion of s7(C; E C; ) results in {C1(sko), aC (sko)}. This translation is usually 
denoted @ and always considered in clausal normal form. 


Theorem 11. Let (J,5,C, E C2) be an abduction problem and © be its first- 
order translation. Then, a TBox H’ is a packed connection-minimal solution to 
the problem if and only if an equivalent hypothesis H can be constructed from 
non-empty sets A and B of atoms verifying: 
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e B= {By(t1),...,Bm(tm)} s-t. (“BI (t1) V- V =B (tm)) € PLE (8), 

e for allt € {t1,...,tm} there exists an A s.t. A(t) € PIS (8), 

e A= {A(t) € PIX} (B) | t is one of ti,...,tm}, and 

e H = {Caz E Cpe | t is one of ty,...,tm and Cg t An Caz}, where CA t = 
Mawes 4 and Cpt =| l Boeg B- 


We call the hypotheses that are constructed as in Theorem 11 constructible. This 
theorem states that every packed connection-minimal hypothesis is equivalent 
to a constructible hypothesis and vice versa. A constructible hypothesis is built 
from the concepts in one negative prime implicate in PZ¥ (®) and all matching 


concepts from prime implicates in PT Ir (8). The matching itself is determined by 
the Skolem terms that occur in all these clauses. The subterm relation between 
the terms of the clauses in PTZ*(®) and PTY (®) is the same as the ancestor 
relation in the description trees of subsumers of C1 and subsumees of Ca respec- 
tively. The terms matching in positive and negative prime implicates allow us 
to identify where the missing entailments between a subsumer D, of Cı and a 
subsumee Də of Ca are. These missing entailments become the constructible H. 
The condition Cg An Ca, is a way to write that Cy, E Cg, is not a tautology, 
which can be tested by subset inclusion. 

The formal proof of this result is detailed in the technical report [16]. We 
sketch it briefly here. To start, we link the subsumers of Cı with PTZ‘ (#). This 
is done at the semantics level: We show that all Herbrand models of 9, i.e., 
models built on the symbols in ®, are also models of PIL (8), that is itself such 
a model. Then we show that C\(sko) as well as the formulas corresponding to 
the subsumers of C in our translation are satisfied by all Herbrand models. This 
follows from the fact that @ is in fact a set of Horn clauses. Next, we show, using 
a similar technique, how duplicate negative ground implicates, not necessarily 
prime, relate to subsumees of C2, with the restriction that there must exist a 
weak homomorphism from a description tree of a subsumer of C to a description 
tree of the considered subsumee of C2. Thus, H provides the missing CIs that 
will turn the weak homomorphism into a (J U H)-homomorphism. Then, we 
establish an equivalence between the <p-minimality of the subsumee of C2 and 
the primality of the corresponding negative implicate. Packability is the last 
aspect we deal with, whose use is purely limited to the reconstruction. It holds 
because A contains all A(t) € PTZ‘ (#) for all terms t occurring in B. 


Example 12. Consider the abduction problem (7a, ©, œa) where © contains all 
concepts from 7a. For the translation ® of this problem, we have 
PTL () = { Professor(sko), Doctor(skg), Chair(sk(sko)), PhD(sk2(sko))} 
PIZ (8) = { aResearcher (sko), 
—ResearchPosition” (sk; (sko)) V =Diploma” (sk2(sko)) } 


where sk, is the Skolem function introduced for Professor LE Jemployment.Chair 
and skə is introduced for Doctor E dqualification.PhD. This leads to two con- 
structible solutions: {Professor Doctor E Researcher} and Hai, that are both 
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packed connection-minimal hypotheses if © = Nc. Another example is presented 
in full details in the technical report [16]. 


Termination. If T contains cycles, there can be infinitely many prime impli- 
cates. For example, for T = {C, E A, A E J3r.A,3r.B E B, BC C2} both the 


positive and negative ground prime implicates of are unbounded even though 
the set of constructible hypotheses is finite (as it is for any abduction problem): 


PIE (®) = {C\(sko), A(sko), A(sk(sko)), A(sk(sk(sko))),..-}, 
PIE, (8) = {Cy (sko), >B™ (sko), 7>B™ (sk(sko)),...}- 


To find all constructible hypotheses of an abduction problem, an approach that 
simply computes all prime implicates of ®, e.g., using the standard resolution 
calculus, will never terminate on cyclic problems. However, if we look only 
for subset-minimal constructible hypotheses, termination can be achieved for 
cyclic and non-cyclic problems alike, because it is possible to construct all such 
hypotheses from prime implicates that have a polynomially bounded term depth, 
as shown below. To obtain this bound, we consider resolution derivations of the 
ground prime implicates and we show that they can be done under some restric- 
tions that imply this bound. 

Before performing resolution, we compute the presaturation ©, of the set of 
clauses ®, defined as 


B, = BU{5A(x) V B(x) | 8 H 4A(x) V B(x)} 


where A and B are either both original or both duplicate atomic concepts. The 
presaturation can be efficiently computed before the translation, using a modern 
EL reasoner such as ELK [23], which is highly optimized towards the computation 
of all entailments of the form A E B. While the presaturation computes nothing 
a resolution procedure could not derive, it is what allows us to bind the maximal 
depth of terms in inferences to that in prime implicates. If p, is presaturated, 
we do not need to perform inferences that produce Skolem terms of a higher 
nesting depth than what is needed for the prime implicates. 

Starting from the presaturated set p, we can show that all the relevant 
prime implicates can be computed if we restrict all inferences to those where 


R1 at least one premise contains a ground term, 

R2 the resolvent contains at most one variable, and 

R3 every literal in the resolvent contains Skolem terms of nesting depth at most 
nxm, where n is the number of atomic concepts in ®, and m is the number 
of occurrences of existential role restrictions in T. 


The first restriction turns the derivation of PTZ‘ () and PTY (®) into an SOS- 
resolution derivation [18] with set of support {C(sko), C3 (sko)}, i.e., the only 
two clauses with ground terms in ®. This restriction is a straightforward conse- 
quence of our interest in computing only ground implicates, and of the fact that 
the non-ground clauses in ® cannot entail the empty clause since every EL TBox 
is consistent. The other restrictions are consequences of the following theorems, 
whose proofs are available in the technical report [16]. 
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Theorem 13. Given an abduction problem and its translation ®, every con- 
structible hypothesis can be built from prime implicates that are inferred under 
restriction 4. 


In fact, for PZ a (®) it is even possible to restrict inferences to generating only 
ground resolvents, as can be seen in the proof of Theorem 13, that directly looks 
at the kinds of clauses that are derivable by resolution from ©. 


Theorem 14. Given an abduction problem and its translation P, every subset- 
minimal constructible hypothesis can be built from prime implicates that have a 
nesting depth of at most nx m, where n is the number of atomic concepts in ®, 
and m is the number of occurrences of existential role restrictions in T. 


The proof of Theorem 14 is based on a structure called a solution tree, which 
resembles a description tree, but with multiple labeling functions. It assigns to 
each node a Skolem term, a set of atomic concepts called positive label, and a 
single atomic concept called negative label. The nodes correspond to matching 
partners in a constructible hypothesis: The Skolem term is the term on which 
we match literals. The positive label collects the atomic concepts in the positive 
prime implicates containing that term. The maximal anti-chains of the tree, 
i.e., the maximal subsets of nodes s.t. no node is the ancestor of another are 
such that their negative labels correspond to the literals in a derivable negative 
implicate. For every solution tree, the Skolem labels and negative labels of the 
leaves determine a negative prime implicate, and by combining the positive and 
negative labels of these leaves, we obtain a constructible hypothesis, called the 
solution of the tree. We show that from every solution tree with solution we 
can obtain a solution tree with solution H’ C H s.t. on no path, there are two 
nodes that agree both on the head of their Skolem labeling and on the negative 
label. Furthermore the number of head functions of Skolem labels is bounded by 
the total number n of Skolem functions, while the number of distinct negative 
labels is bounded by the number m of atomic concepts, bounding the depth of 
the solution tree for H’ at n x m. This justifies the bound in Theorem 14. This 
bound is rather loose. For the academia example, it is equal to 22 x 6 = 132. 


5 Implementation 


We implemented our method to compute all subset-minimal constructible 
hypotheses in the tool CAPI.? To compute the prime implicates, we used SPASS 
[34], a first-order theorem prover that includes resolution among other calculi. 
We implemented everything before and after the prime implicate computation in 
Java, including the parsing of ontologies, preprocessing (detailed below), clausifi- 
cation of the abduction problems, translation to SPASS input, as well as the pars- 
ing and processing of the output of SPASS to build the constructible hypotheses 
and filter out the non-subset-minimal ones. On the Java side, we used the OWL 
API for all DL-related functionalities [20], and the E£ reasoner ELK for comput- 
ing the presaturations [23]. 


3 available under https://lat.inf.tu-dresden.de/~koopmann/CAPI. 
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Preprocessing. Since realistic TBoxes can be too large to be processed by SPASS, 
we replace the background knowledge in the abduction problem by a subset of 
axioms relevant to the abduction problem. Specifically, we replace the abduction 
problem (7, X, C1 E C2) by the abduction problem (MZ, U Md, £, C1 E C2), 
where M@, is the L-module of T for the signature of C1, and M4, is the T- 
module of T for the signature of C2 [15]. Those notions are explained in the 
technical report [16]. Their relevant properties are that Mé, is a subset of 7 
s.t. MG, H= Ci E D iff T} CE D for all concepts D, while M£, is a subset 
of T that ensures Mé, = DC Cy if T H D E Cho for all concepts D. It 
immediately follows that every connection-minimal hypothesis for the original 
problem (7,%,Cı E C2) is also a connection-minimal hypothesis for (Mō, U 
Må, £, Cı E C3). For the presaturation, we compute with ELK all CIs of the 
form AC B s.t. Mé, U Mé, E ACB. 


Prime implicates generation. We rely on a slightly modified version of SPASS 
v3.9 to compute all ground prime implicates. In particular, we added the possi- 
bility to limit the number of variables allowed in the resolvents to enforce R2. 
For each of the restrictions R1—R3 there is a corresponding flag (or set of flags) 
that is passed to SPASS as an argument. 


Recombination. The construction of hypotheses from the prime implicates found 
in the previous stage starts with a straightforward process of matching negative 
prime implicates with a set of positive ones based on their Skolem terms. It is 
followed by subset minimality tests to discard non-subset-minimal hypotheses, 
since, with the bound we enforce, there is no guarantee that these are valid 
constructible hypotheses because the negative ground implicates they are built 
upon may not be prime. If SPASS terminates due to a timeout instead of reaching 
the bound, then it is possible that some subset-minimal constructible hypotheses 
are not found, and thus, some non-constructible hypotheses may be kept. Note 
that these are in any case solutions to the abduction problem. 


6 Experiments 


There is no benchmark suite dedicated to TBox abduction in EL, so we created 
our own, using realistic ontologies from the bio-medical domain. For this, we used 
ontologies from the 2017 snapshot of Bioportal [27]. We restricted each ontol- 
ogy to its EL fragment by filtering out unsupported axioms, where we replaced 
domain axioms and n-ary equivalence axioms in the usual way [2]. Note that, 
even if the ontology contains more expressive axioms, an EL hypothesis is still 
useful if found. From the resulting set of TBoxes, we selected those contain- 
ing at least 1 and at most 50,000 axioms, resulting in a set of 387 EL TBoxes. 
Precisely, they contained between 2 and 46,429 axioms, for an average of 3,039 
and a median of 569. Towards obtaining realistic benchmarks, we created three 
different categories of abduction problems for each ontology 7T, where in each 
case, we used the signature of the entire ontology for X. 
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e Problems in ORIGIN use J as background knowledge, and as observation a 
randomly chosen A E B s.t. A and B are in the signature of T and T KF AC 
B. This covers the basic requirements of an abduction problem, but has the 
disadvantage that A and B can be completely unrelated in T. 

e Problems in JUSTIF contain as observation a randomly selected CI a s.t., for 
the original TBox, T H a and a ¢ T. The background knowledge used is a 
justification for a in T [32], that is, a minimal subset Z C T s.t. T Aa, from 
which a randomly selected axiom is removed. The TBox is thus a smaller set 
of axioms extracted from a real ontology for which we know there is a way of 
producing the required entailment without adding it explicitly. Justifications 
were computed using functionalities of the OWL API and ELK. 

e Problems in REPAIR contain as observation a randomly selected CI a s.t. 
T | a, and as background knowledge a repair for a in T, which is a maximal 
subset R C T s.t. R jÆ a. Repairs were computed using a justification- 
based algorithm [32] with justifications computed as for JUSTIF. This usually 
resulted in much larger TBoxes, where more axioms would be needed to 
establish the entailment. 


All experiments were run on Debian Linux (Intel Core i5-4590, 3.30 GHz, 
23 GB Java heap size). The code and scripts used in the experiments are available 
online [17]. The three phases of the method (see Fig. 2) were each assigned a 
hard time limit of 90s. 

For each ontology, we attempted to create and translate 5 abduction prob- 
lems of each category. This failed on some ontologies because either there was 
no corresponding entailment (25/28/25 failures out of the 387 ontologies for 
ORIGIN/JUSTIF/REPAIR), there was a timeout during the translation (5/5/5 
failures for ORIGIN/JUSTIF/REPAIR), or because the computation of justifica- 
tions caused an exception (-/2/0 failures for ORIGIN/JUSTIF/REPAIR). The final 
number of abduction problems for each category is in the first column of Table 1. 

We then attempted to compute prime implicates for these benchmarks using 
SPASS. In addition to the hard time limit, we gave a soft time limit of 30s to 
SPASS, after which it should stop exploring the search space and return the 
implicates already found. In Table 1 we show, for each category, the percentage 
of problems on which SPASS succeeded in computing a non-empty set of clauses 
(Success) and the percentage of problems on which SPASS terminated within the 
time limit, where all solutions are computed (Compl.). The high number of CIs 
in the background knowledge explains most of the cases where SPASS reached 
the soft time limit. In a lot of these cases, the bound on the term depth goes 
into the billion, rendering it useless in practice. However, the “Compl.” column 
shows that the bound is reached before the soft time limit in most cases. 

The reconstruction never reached the hard time limit. We measured the 
median, average and maximal number of solutions found (#H), size of solu- 
tions in number of CIs (|H|), size of CIs from solutions in number of atomic 
concepts (|a|), and SPASS runtime (time, in seconds), all reported in Table 1. 
Except for the simple JUSTIF problems, the number of solutions may become 
very large. At the same time, solutions always contain very few axioms (never 
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Table 1. Evaluation results. 


Median / avg / max 


#Probl. | Success | Compl. | #H IH] jal Time (s.) 
ORIGIN 1,925 94.7% |61.3% |1/8.51/1850 | 1/1.00/2|6/7.48/91 | 0.2/12.4/43.8 
JUSTIF | 1,803 | 100.0% | 97.2% | 1/1.50/5 1/1/1 | 2/4.21/32 | 0.2/1.1/34.1 


REPAIR | 1,805 92.9% | 57.0% | 43/228.05/6317 | 1/1.00/2 | 5/5.09/49 | 0.6/13.6/59.9 


more than 3), though the axioms become large too. We also noticed that highly 
nested Skolem terms rarely lead to more hypotheses being found: 8/1/15 for 
ORIGIN/JUSTIF/REPAIR, and the largest nesting depth used was: 3/1/2 for 
ORIGIN/JUSTIF/REPAIR. This hints at the fact that longer time limits would not 
have produced more solutions, and motivates future research into redundancy 
criteria to stop derivations (much) earlier. 


7 Conclusion 


We have introduced connection-minimal TBox abduction for EL which finds 
parsimonious hypotheses, ruling out the ones that entail the observation in an 
arbitrary fashion. We have established a formal link between the generation of 
connection-minimal hypotheses in E£ and the generation of prime implicates of 
a translation & of the problem to first-order logic. In addition to obtaining these 
theoretical results, we developed a prototype for the computation of subset- 
minimal constructible hypotheses, a subclass of connection-minimal hypotheses 
that is easy to construct from the prime implicates of &. Our prototype uses 
the SPASS theorem prover as an SOS-resolution engine to generate the needed 
implicates. We tested this tool on a set of realistic medical ontologies, and the 
results indicate that the cost of computing connection-minimal hypotheses is 
high but not prohibitive. 

We see several ways to improve our technique. The bound we computed to 
ensure termination could be advantageously replaced by a redundancy criterion 
discarding irrelevant implicates long before it is reached, thus greatly speeding 
computation in SPASS. We believe it should also be possible to further constrain 
inferences, e.g., to have them produce ground clauses only, or to generate the 
prime implicates with terms of increasing depth in a controlled incremental way 
instead of enforcing the soft time limit, but these two ideas remain to be proved 
feasible. As an alternative to using prime implicates, one may investigate direct 
method for computing connection-minimal hypotheses in E£. 

The theoretical worst-case complexity of connection-minimal abduction is 
another open question. Our method only gives a very high upper bound: by 
bounding only the nesting dept of Skolem terms polynomially as we did with 
Theorem 13, we may still permit clauses with exponentially many literals, and 
thus double exponentially many clauses in the worst case, which would give us 
an 2EXP'TIME upper bound to the problem of computing all subset-minimal con- 
structible hypotheses. Using structure-sharing and guessing, it is likely possible 
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to get a lower bound. We have not looked yet at lower bounds for the complexity 
either. 

While this work focuses on abduction problems where the observation is a CI, 
we believe that our technique can be generalised to knowledge that also contains 
ground facts (ABoxes), and to observations that are of the form of conjunctive 
queries on the ABoxes in such knowledge bases. The motivation for such an 
extension is to understand why a particular query does not return any results, 
and to compute a set of TBox axioms that fix this problem. Since our translation 
already transforms the observation into ground facts, it should be possible to 
extend it to this setting. We would also like to generalize TBox abduction by 
finding a reasonable way to allow role restrictions in the hypotheses, and to 
extend connection-minimality to more expressive DLs such as ALC. 
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Abstract. A clause C is syntactically relevant in some clause set N, 
if it occurs in every refutation of N. A clause C is syntactically semi- 
relevant, if it occurs in some refutation of N. While syntactic relevance 
coincides with satisfiability (if C is syntactically relevant then N \ {C} 
is satisfiable), the semantic counterpart for syntactic semi-relevance was 
not known so far. Using the new notion of a conflict literal we show that 
for independent clause sets N a clause C is syntactically semi-relevant 
in the clause set N if and only if it adds to the number of conflict literals 
in N. A clause set is independent, if no clause out of the clause set is the 
consequence of different clauses from the clause set. 

Furthermore, we relate the notion of relevance to that of a minimally 
unsatisfiable subset (MUS) of some independent clause set N. In proposi- 
tional logic, a clause C is relevant if it occurs in all MUSes of some clause 
set N and semi-relevant if it occurs in some MUS. For first-order logic 
the characterization needs to be refined with respect to ground instances 
of N and C. 


1 Introduction 


In our previous work [11], we introduced a notion of syntactic relevance based 
on refutations while at the same time generalized the completeness result for 
resolution by the set-of-support strategy (SOS) [28,33] as its test. Our notion of 
syntactic relevance is useful for explaining why a set of clauses is unsatisfiable. 
In this paper, we introduce a semantic counterpart of syntactic relevance that 
sheds further light on the relationship between a clause out of a clause set and 
the potential refutations of this clause set. In the following Sect.1.1, we first 
recall syntactic relevance along with an example and then proceeds to explain it 
in terms of our new semantic relevance in the later Sect. 1.2. 


1.1 Syntactic Relevance 


Given an unsatisfiable set of clauses N, C € N is syntactically relevant if it occurs 
in all refutations, it is syntactically semi-relevant if it occurs in some refutation, 
otherwise it is called syntactically irrelevant. The clause-based notion of relevance 
is useful in relating the contribution of a clause to refutation (goal conjecture). 
© The Author(s) 2022 
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This has in particular been shown in the context of product scenarios built out 
of construction kits as they are used in the car industry [8,32]. 

For an illustration of our privous notions and results we now consider the 
following unsatisfiable first-order clause set N where Fig. 1 presents a refutation 
of N. 


(D Par) (D AF@) V D@s)) 


£3 t+ £7} 
((3) =B(c,a) V B(b,f(x6))) ((4) B(w1,02) V C(e1)) (T) A(F(@)) ((6) 7A@s) V =B(b,z4)) 
EG = c, 22 a} hea > F(a} 
((9) BO.F(#6)) V C(e) (8) ~B(b,f(@)) 
{tg = @ 
((5) =C@s)) (00) C(O) 
{zs = c} 
Gya 


Fig. 1. A refutation of N in tree representation 


In essence, inferences in an SOS refutation always involve at least one 
clause in the SOS and put the resulting clause back in it. So, this refu- 
tation is not an SOS refutation from the syntactically semi-relevant clause 
(3)-B(c,a) V B(b,f(x6)), because only the shaded part represents an SOS 
refutation starting with this clause. More specifically, there are two infer- 
ences ended in (8)-B(b,f(a)) which violates the condition for an SOS refu- 
taiton. Nevertheless, it can be transformed into an SOS refutation where the 
clause (3)B(c,a) V B(b,f(x6)) is in the SOS [11], Fig. 2. Please note that 
N \ {(3)-B(c, a) V B(b, f(xe))} is still unsatisfiable and classical SOS complete- 
ness [33] is not sufficient to guarantee the existence of a refutation with SOS 
{(3)B(c,a) V B(b,f (z6))} [11]. 

In addition, N \ {(3)-B(c,a) V B(b, f(ae))} is also a minimally unsatisfi- 
able subset (MUS), where Fig. 3 presents a respective refutation. A MUS is an 
unsatisfiable clause set such that removing a clause from this set would ren- 
der it satisfiable. Consequently, a MUS-based defined notion of semi-relevance 
on the level of the original first-order clauses is not sufficient here. The clause 
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((3) =B(c,a) V B(b,f(we))) ((4) B(wr,#2) V C(a1)) 
zı | c, £2 ++ a} 
((9) Bee, f(x6)) V C(c)) ((6) =A(x4) V =B(b,z4)) 
ze > f(x6)} 
((T) -Al (z6) V C())} (HD AG (@) V D(es)} 
{rem a 
((2) =D@r)) (8) Dæ) v elo) 


{z3 + z7} 


(a0) C()) 


(© =c) 
{zs > c} 


quien 


Fig. 2. Semi-relevant clause (3)-B(c, a) V B(b, f(a6)) in SOS 


(3)-B(c,a) V B(b, f(xe)) should not be disregarded, because it leads to a dif- 
ferent grounding of the clauses. For example, in the refutation of Fig.2 clause 
(5)=C (as) is necessarily instantiated with {x5 ++ c} where in the refutation of 
Fig. 3 it is necessarily instantiated with {x5 +> b}. Therefore, the two refutations 
are different and clause (3)-B(c,a) V B(b, f(a6)) should be considered semi- 
relevant. Nevertheless, in propositional logic it is sufficient to consider MUSes 
to explain unsatisfiability on the original clause level, Lemma 18. 


((1) AY@) V D@ws)} ((6) 7A(ea) V =B(b,xa)) 
a > F(a} 
(2) =D(#7)) (12) D(#s) V =B(b,f (a))) 
Eze > a7} 
((4) B(21,22) V C(a1)) (13) BG, F(a) 
{z1 = b, 22H FAN 
(©) =C(@s)) (a4) c)) 
{x5 +> b} 
(11) L 


Fig. 3. A refutation of N without (3)=B(c,a) V B(b, f(x6)) 


1.2 Semantic Relevance 


We now illustrate how our new notion of relevance works on the previous exam- 
ple. First, different from the other works, we propose a way of characterizing 
semantic relevance by using our novel concept of a conflict literal. A ground 
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literal L is a conflict literal in a clause set N if there are some satisfiable sets of 
instances N; and No from N s.t. Nı = L and Nə = comp(Z). On the one hand, 
explaining an unsatisfiable clause set as the absence of a model (as it is usually 
defined) is not that helpful since an absence means there is nothing to discuss in 
the first place. On the other hand, the contribution of a clause to unsatisfiability 
of a clause set can only partially be explained using the concept of a MUS which 
we have discussed before. A conflict literal provides a middle ground to explain 
the contribution of a clause to unsatisfiability between the absence of a model 
and MUSes. It also better reflects our intuition that there is a contradiction (in 
the form of two implied simple facts that cannot be both true at the same time) 
in an unsatisfiable set of clauses. 

From Fig.1, we can already see that C(c) and its complement —C(c) are 
conflict literals because 


N\{7C(2)} E C(e) 
AC (x) H =C (c) 
Also, in addition to that {=C(x)} is trivially satisfiable, N \ {=C(a)} is also 
satisfiable. Based on the refutation in Fig. 3, ~C(a) is syntactically relevant due 
to N \ {(3)>B(c, a) V B(b, f(ag))} being a MUS. We will also show that for a 
ground MUS any ground literal occurring in it is a conflict literal, Lemma 20. 
For our ongoing example it is still possible to identify the conflict literals by 
means of ground MUSes by looking into the refutations from Fig. 1 and Fig. 3. 
This leads to the following conflict literals for N, see Definition 10: 
conflict(N) = {(>)A(f(a)), 
(=) B(b, f(a), (-)B(c, a), 
(>)C(8), Cle); U 
{(~)D(t) | t is a ground term} 


These conflict literals can be identified by pushing the substitutions in the refu- 
tations from Fig. 1 and Fig. 3 towards the input clauses. They correspond to two 
first-order MUSes Mı and Mə. All the ground literals are conflict literals and 
all other ground conflict literals can be obtained by grounding the remaining 
variables. 
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One can see that, despite (3)-B(c,a) V B(b, f(xe)) is outside of the only MUS 
on the first-order level, an instance of it does occur in some ground MUS, take 
M; and an arbitrary grounding of x3 and 7 to the identical term t, and the con- 
flict literal (=) B(c, a) depends on clause (3). Nevertheless, determining conflict 
literals is not so obvious in the general case since we do not necessarily know 
beforehand which ground terms should substitute the variables in the clauses. 
Moreover, there can be an infinite number of such ground MUSes of possibly 
unbounded size. 

Based on conflict literals, here we introduce a notion of relevance that is 
semantic in nature, Definition 16. This will also serve as an alternative char- 
acterization to our previous refutation-based syntactic relevance. As redundant 
clauses, e.g., tautologies, can also be syntactically semi-relevant, we require inde- 
pendent clause sets for the definition of semantic relevance. A clause set is inde- 
pendent, if it does not contain clauses with instances implied by satisfiable sets of 
instances of different clauses out of the set. Given an unsatisfiable independent 
set of clauses N, a clause C is relevant in N if N without C has no conflict 
literals, it is semi-relevant if C is necessary to some conflict literals, and it is 
irrelevant otherwise. 

Similar to our previous work, relevant clauses are the obvious ones because 
removing them would make our set satisfiable. On the other hand, irrelevant 
clauses can be freely identified once we know the semi-relevant ones. For our 
running example, in fact (3)-B(c, a) V B(b, f (xe)) is semi-relevant because it is 
necessary for the conflict literals (=)C(c) and (—=)B(c,a). More specifically, the 
set of conflicts for N \ {=B(c,a) V B(b, f(ag))} does not include (-)C(c) and 


(=) B(c, a): 


conflict(N \ {>B(c, a) V B(b, f(x6))}) = {(-)A(F(@)), (*) BOO, F(a), (A)C(O) Fe 
{(=)D(#)|¢ is a ground term} 


These are conflict literals identifiable from Mə: Assume that the variables 
z3 and #7 in Mə are both grounded by an identical term t. Take some ground 
literal, for example, A(f(a)) € conflict(N \ {=B(c, a) V B(b, f(xe))), and define 


No = {C € Mp|A(f(a)) Z C and —A(f(a)) ¢ C} 
= {(5)=C (b), (4) B(b, f (a)), (D) >D} 
Nacf(a)) = {C € M| A(f(a)) € C} 
= {(1)A(f(a)) V D(t)} 
Noa(f(a)) = {C € Ma|-A(f(a)) € C} 
= {(6)-A(f(a)) V =B(b, f(a))} 


No UNa(f(a)) and NgUN_a(f(a)) are satisfiable because of the Herbrand model 
{B(b, f(a)), A(f(a))} and {B(b, f(a))} respectively. In addition, 


No UNa(f(a)) H A(f(@)) 
Ng U Noacfiay) EAC Ca)) 


Semantic Relevance 213 


because A(f(a)) can be acquired using resolution between (1) and (2) for Ng U 
Nacp(a)) and —A(f(a)) can be acquired using resolution between (4) and (6) for 
No UN acp(a))- In a similar manner, we can show that the other ground literals 
are also conflict literals. 


Related Work: Other works which aim to explain unsatisfiability mostly rely on 
the notion of MUSes, mainly in propositional logic [14—16, 21,26]. The complexity 
of determining whether a clause set is a MUS is D?-complete for a propositional 
clause set with at most three literals per clause and at most three occurrences 
of each propositional variable [25]. In [14], syntactically semi-relevant clauses 
for propositional logic are called a plain clause set. Using the terminology in 
[16], a clause C € N is necessary if it occurs in all MUSes, it is potentially 
necessary if it occurs in some MUS, otherwise, it is never necessary. In addition, 
a clause is defined to be usable if it occurs in some refutation. This is thus 
similar to our syntactic notion of semi-relevance [11]: Given a clause C € N, 
C is usable if-and-only-if C is syntactically semi-relevant. It is also argued that 
a usable clause that is not potentially necessary is semantically superfluous. A 
different but related notion has also been applied for propositional abduction [7]. 
The notion of a MUS has also been used for explaining unsatisfiability in first- 
order logic [20]. There, it has been defined in a more general setting: If a set 
of clauses N is divided into N = N’ w N” with a non-relarable clause set N’ 
and relarable clause set N” (which must be satisfiable), a MUS is a subset 
M of N” s.t. N'W M is unsatisfiable but removing a clause from M would 
render it satisfiable. There are also some works in satisfiability modulo theory 
(SMT) [5,6,9,35]. A deletion-based approach well-known in propositional logic 
has also been used for MUS extraction in SMT [9]. In [5,6], a MUS is extracted by 
combining an SMT solver with an arbitrary external propositional core extractor. 
Another approach is to construct some graph representing the subformulas of 
the problem instance, recursively remove clauses in a depth-first-search manner 
and additionally use some heuristics to further improve the runtime[35]. For 
the function-free and equality-free first-order fragment, there is a ” decompose- 
merge” approach to compute all MUSes [19,34]. In description logic, a notion 
that is related to MUS is called minimal aziom set (MinA) usually identified by 
the problem of axiom pinpointing [1,4,13,30]. Its computation is usually divided 
into two categories: black-box and white-box. A black-box approach picks some 
inputs, executes it using some sound and complete reasoner, and then interprets 
the output [13]. On the other hand, white-box approach takes some reasoner 
and performs an internal modification for it. In this case, Tableau is mostly 
used [1,30]. In addition, the concept of a lean kernel has also been used to 
approximate the union of such MinA’s [27]. The way relevance is defined is similar 
in spirit but usually used for an entailment problem instead of unsatisfiability. 
The notion of syntactic semi-relevance has also been applied to description logics 
via a translation scheme to first-order logic [10]. 

The paper is organized as follows. Section 2 fixes the notations, definitions 
and existing results in particular from [11]. Section3 is reserved for our new 
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notion of semantic relevance. Finally, we conclude our work in Sect.4 with a 
discussion of our results. 


2 Preliminaries 


We assume a standard first-order language without equality over a signature 
X = (92,11) where 22 is a non-empty set of functions symbols, IH a non-empty 
set of predicate symbols both coming with their respective fixed arities denoted 
by the function arity. The set of terms over an infinite set of variables Æ is 
denoted by T(X, Æ). Atoms, literals, clauses, and clause sets are defined as 
usual, e.g., see [24]. We identify a clause with its multiset of literals. Variables 
in clauses are universally quantified. Then N denotes a clause set; C, D denote 
clauses; L, K denote literals; A, B denote atoms; P,Q, R,T denote predicates; 
t,s terms; f,g,h functions; a,b,c,d constants; and x,y,z variables, all possibly 
indexed. The complement of a literal is denoted by the function comp. Atoms, 
literals, clauses, and clause sets are ground if they do not contain any variable. 

An interpretation Z with a nonempty domain (or universe) U assigns (i) a 
total function f? : U” + U for each f € Q with arity(f) = n and (ii) a relation 
P C U” to every predicate symbol P? € IT with arity(P) = m. A valuation 8 
is a function X + U where the assignment of some variable x can be modified 
toe EU by Blab e]. It is extended to terms as Z(() : T(X, ¥) = U. Seman- 
tic entailment | considers variables in clauses to be universally quantified. The 
extension to atoms, literals, disjunctions, clauses and sets of clauses is as fol- 
lows: Z(8)(P(t1,...,tn)) = 1 if (Z(B)(t1),...,Z(B)(tn)) € P? and 0 otherwise; 
T(B)(A¢) = 1 — T(G)(¢); for a disjunction Lı V...V Ly, T(6)(Li V... V Lg) = 
max(Z(8)(L1),...,Z(@)(Lx)); for a clause C, Z(8)(C) = 1 if for all valuations 
B = {x1 > €1,...,2n > en} where the x; are the free variables in C there is 
a literal L € C such that Z(3)(L) = 1; for a set of clauses N = {C1,..., Ck}, 
T(B)({Ci,...,Cx}) = min(Z(B)(C),...,Z(G)(Cz)). A set of clauses N is satis- 
fiable if there is an Z of N such that Z()(N) = 1, 8 arbitrary, (in this case T is 
called a model of N: T = N) otherwise N is called unsatisfiable. 

Substitutions o,7 are total mappings from variables to terms, where 
dom(c) := {x | xo # x} is finite and codom(c) := {t | wo = t,x € dom(a)}. 
A renaming o is a bijective substitution. The application of substitutions is 
extended to literals, clauses, and sets/sequences of such objects in the usual 
way. If C’ = Co for some substitution ø, then C” is an instance of C. A unifier 
o for a set of terms t,...,t, satisfies t;o = tjo for all 1 < i,j < k and it is called 
a most general unifier if for any unifier o’ of t1,...,t, there is a substitution T 
s.t. o' = or. The function mgu denotes the most general unifier of two terms, 
atoms, literals if it exists. We assume that any mgu of two terms or literals does 
not introduce any fresh variables and is idempotent. 

The resolution calculus consists of two inference rules: Resolution and Fac- 
toring [28,29]. The rules operate on a state (N, S) where the initial state for 
a classical resolution refutation from a clause set N is (Ø, N) and for an SOS 
(Set Of Support) refutation with clause set N and initial SOS clause set S the 
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initial state is (N, S). We describe the rules in the form of abstract rewrite rules 
operating on states (N,S). As usual we assume for the resolution rule that the 
involved clauses are variable disjoint. This can always be achieved by applying 
renamings into fresh variables. 


Resolution (N,SW{CV K}) >rrs (N,SU{CV Kk, (DV C)o}) 
provided (DV L) € (NUS) and o = mgu(L, comp(K)) 


Factoring (N,SW{CV LV K}) >rrs (N,SU{CVLV K}U{(CV L)o}) 
provided o = mgu(L, K) 


The clause (DVC)o is the result of a Resolution inference between its parents 
and called a resolvent. The clause (C V L)ø is the result of a Factoring inference 
of its parent and called a factor. A sequence of rule applications (N, S) >hgs 
(N, S’) is called a resolution derivation. It is called an SOS resolution derivation 
if N # Ọ. In case L € S’ it is a called a (SOS) resolution refutation. If for two 
clauses C, D there exists a substitution ø such that Co C D, then we say that 
C subsumes D. In this case C = D. 


Theorem 1 (Soundness and Refutational Completeness of (SOS) Res- 
olution [11,28,33]). Resolution is sound and refutationally complete [28]. If for 
some clause set N and initial SOS S, N is satisfiable and NUS is unsatisfiable, 
then there is a (SOS) resolution derivation of L from (N, S) [33]. If for some 
clause set N and clause C € N there exists a resolution refutation from N using 
C, then there is an SOS derivation of L from (N \ {C},{C}) /11]. 


Please note that the recent SOS completeness result of [11] generalizes the 
classical SOS completeness result by [33]. 


Theorem 2 (Deductive Completeness of Resolution [17,22]). Given a 
set of clauses N and a clause D, if N = D, then there is a resolution derivation 
of some clause C from (0, N) such that C subsumes D. 


For deductions we require every clause to be used exactly once, so deductions 
always have a tree form. 


Definition 3 (Deduction [11]). A deduction my = [C1,..., Cn] of a clause 
Cn from some clause set N is a finite sequence of clauses such that for each Ci 
the following holds: 


1.1 C; is a renamed, variable-fresh version of a clause in N, or 

1.2 there is a clause Cj € ny, j <i s.t. Ci is the result of a Factoring inference 
from Cj, or 

1.3 there are clauses Cj, Ck E nN, j < k <i s.t. Ci is the result of a Resolution 
inference from Cj and Ck, 


and for each Ci E nN, i< n: 
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2.1 there exists exactly one factor Cj of Ci with j >i, or 
2.2 there exists exactly one Cj and Cp such that Cp is a resolvent of Ci and C; 
and i,j < k. 


We omit the subscript N in ry if the context is clear. 


A deduction 7’ of some clause C € 7, where 7, 7’ are deductions from N is a 
subdeduction of m if x’ C m, where the subset relation is overloaded for sequences. 
A deduction ty = [Ci,...,Cyn—1, L] is called a refutation. While the conditions 
3.1.1, 3.1.2, and 3.1.3 are sufficient to represent a resolution derivation, the 
conditions 3.2.1 and 3.2.2 force deductions to be minimal with respect to Cy. 

Note that variable renamings are only applied to clauses from N such that all 
clauses from N that are introduced in the deduction are variable disjoint. Also 
recall that our notion of a deduction implies a tree structure. Both assumptions 
together admit the existence of overall grounding substitutions for a deduction. 


Definition 4 (Overall Substitution of a Deduction [11]). Given a deduc- 
tion 7 of a clause Cn the overall substitution Tr, of Ci € m is recursively defined 
by 


1 if Ci is a factor of Cj with j < i and mgu o, then Tri = Tr j ° 0, 

2 if Ci is a resolvent of Cj and Ck with j < k < i and mgu o, then Tri = 
(Trj O Tak) °, 

3 if Ci is an initial clause, then Tr i =, 


and the overall substitution of the deduction is Tr = Trn. We omit the subscript 
T if the context is clear. 


Overall substitutions are well-defined because clauses introduced from N into 
the deduction are variable disjoint and each clause is used exactly once in the 
deduction. A grounding of an overall substitution 7 of some deduction 7 is a 
substitution rô such that codom(rd) only contains ground terms and dom(é) is 
exactly the variables from codom(r). 


Definition 5 (SOS Deduction [11]). A deduction nyus = [Ci,...,Cn] is 
called an SOS deduction with SOS S, if the derivation (N, So) >pag (N, Sm) is 
an SOS derivation where C,...,C!, is the subsequence from |C1,..., Cn] with 


input clauses removed, So = S, and Sii = Si U Cipi- 


Oftentimes, it is of particular interest to identify the set of clauses that is 
minimally unsatisfiable, i.e., removing a clause would make it satisfiable. The 
earliest mention of such a notion is in [26] where it is introduced via a decision 
problem. Minimally unsatisfiable sets (MUS) have also gained a lot of attention 
in practice. 


Definition 6 (Minimal Unsatisfiable Subset (MUS) [20]). Given an 
unsatisfiable set of clauses N, the subset N' C N is a minimally unsatisfiable 
subset (MUS) of N if any strict subset of N’ is satisfiable. 
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In our previous work, we defined a notion of relevance based on how clauses 
may contribute to unsatisfiability by means of refutations. 


Definition 7 (Syntactic Relevance [11]). Given an unsatisfiable set of 
clauses N, a clause C € N is syntactically relevant if for all refutations m 
of N it holds that C € t. A clause C € N is syntactically semi-relevant if there 
exists a refutation n of N in which C € m. A clause C € N is syntactically 
irrelevant if there is no refutation a of N in which C € 7r. 


Syntactic relevance can be identified by using the resolution calculus. A clause 
C € N is syntactically semi-relevant if and only if there exists an SOS refutation 
from SOS {C} and N \ {C}. 


Theorem 8 (Syntactic Relevance [11]). Given an unsatisfiable set of clauses 
N, the clause CEN is 


1. syntactically relevant if and only if N \ {C} is satisfiable, 
2. syntactically semi-relevant if and only if (N \ {C},{C}) Spas (N\{C}, SU 
{L}). 


An open problem from [11] is the question of a semantic counterpart to 
syntactic semi-relevance. Without any further properties of the clause set N, the 
notion of semi-relevance can lead to unintuitive results. For example, a tautology 
could be semi-relevant. Given a refutation showing semi-relevance of some clause 
C, where, in the refutation, some unary predicate P occurs, the refutation can be 
immediately extended using the tautology P(x) V ~P(x). We may additionally 
stumble upon a problem in the case where our set of clauses contains a subsumed 
clause. For example, if both clauses Q(a) and Q(x) exist in a clause set, they 
may be both semi-relevant, although from an intuition point of view one may 
only want to consider Q(x) to be semi-relevant, or even relevant. On the other 
hand, in some cases, redundant clauses are welcome as semi-relevant clauses. 


Example 9 (Redundant Clauses). Given a set of clauses 
N= {Q(z), Q(a), =Q(a) V P(b), ~P(b), P(x) V =P(x)}, 


all clauses are syntactically semi-relevant while =Q(a) V P(b) and —P(b) are 
syntactically relevant. However, if we disregard the redundant clauses Q(a) and 
P(x)V~—P(x), then the clause Q(x) becomes a relevant clause. Therefore, for our 
semantic notion of relevance we will only consider clause sets without clauses 
implied by other, different clauses from the clause set. 


3 Semantic Relevance 


Except for the trivially false clause L, the simplest form of a contradiction is 
two unit clauses K and L such that K and comp(Z) are unifiable. They will 
be called conflict literals, below. Then the idea for our semantic definition of 
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semi-relevance is to consider clauses that contribute to the number of conflict 
literals of a clause set. Furthermore, we will show that in any MUS every literal 
is a conflict literal. 

While conflict literals could straightforwardly be defined in propositional 
logic having the above idea in mind, in first-order logic we have always to relate 
properties of literals, clauses to their respective ground instances. This is simply 
due to the fact that unsatisfiability of a first-order clause set is given by unsat- 
isfiability of a finite set of ground instances from this set. Eventually, we will 
show that for independent clause sets a clause is semi-relevant, if it contributes 
to the number of conflict literals. 


Definition 10 (Conflict Literal). Given a set of clauses N over some sig- 
nature X, a ground literal L is a conflict literal in a clause set N if there are 
two satisfiable clause sets Ni, Na such that 


1. the clauses in Ny, Na are instances of clauses from N and 
2. Nı H L and N3 — comp(L). 


conflict(N) denotes the set of conflict literals in N. 


Our notion of a conflict literal generalizes the respective notion in [12] defined 
for propositional logic. 


Example 11 (Conflict Literal). Given an unsatisfiable set of clauses over the 
signature X = ({a,b,c,d, f}, {P}: 


N = {-P(f(a,2)) V >P(f(e,y)), P(F(@, a) V P(F(y, 6) } 
Consider the following satisfiable sets of instances from N 


Nı = {>P( F(a, d)) V =P(F (c, y)), PUF(@, d)) V P(F (a, b))} 
Na = {>P(F (a, b)) V =P(F (c, y)), P(E (x, d)) V P(F(c, b))} 


P(f(a,b)) is a conflict literal because Ni = P(f(a,b)) and No H =~P(f(a,b)). 


We can show that N, = P(f(a,b)) because the resolution calculus is sound. 
Resolving both literals of ~P(f(a,d)) V ~P(f(c, y)) with the first literal of the 
clause P( f(a, d)) V P(f(a, b)) results in the clause P( f(a, b)) V P(f(a, 6)) which 
can be factorized to P( f(a, b)). Moreover, N; is satisfiable: An interpretation 7 
with Z(P(f(a,b))) = 1 and Z(P(t)) = 0 for all terms t 4 f(a, b) satisfies Ny and 
P(f(a,b)). No = —~P(f(a,b)) can also be shown in the same manner. 


Example 12 (Conflict Literal). Given 


N = {7>R(z), R(c) V P(a, y), 
Q(a), Q(x) V P(x, b), 
—=P(a, b)} 
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its conflict literals are 


conflict(N) = { P(a,b), >P(a, b), 
R(c), “R(O), 
Q(a), -Q(a)} 


In addition to a refutation, the existence of a conflict literal is another way 
to characterize unsatisfiability of a clause set. Obviously, conflict literals always 
come in pairs. 


Lemma 13 (Minimal Unsatisfiable Ground Clause Sets and Conflict 
Literals). If N is a minimally unsatisfiable set of ground clauses (MUS) then 
any literal occurring in N is a conflict literal. 


Proof Take any ground atom A such that A occurs in N. N can be split into 
three disjoint clause sets: 


Ng = {C € N|A g C and 7A ¢C} 
Na =4{CEN|AEC} 
N-a ={C Ee NAE CH 


Since N is minimal, N4 and N-a are nonempty, because otherwise A is a pure 
literal and its corresponding clauses can be removed from N preserving unsatis- 
fiability. Obviously Ng U Na must be satisfiable, for otherwise the initial choice 
of N was not minimal. However, Ng U N4, where N4 results from all N4 by 
deleting all A literals from the clauses of N4, must be unsatisfiable, for oth- 
erwise we can construct a satisfying interpretation for N. Thus, every model 
of Ng U Na must also be a model of A: Ng U Na — A. Using the same argu- 
ment, Ng U N-a is satisfiable and Ng U N-a H 7A. Therefore, A is a conflict 
literal. 


Lemma 14 (Conflict Literals and Unsatisfiability). Given a set of clauses 
N, conflict(N) 4 0 if and only if N is unsatisfiable. 


Proof “=” Let L € conflict(N). By definition, there are two satisfiable subsets 
of instances N1, Nə from N such that Nı = L and Nə |} comp(L). Towards 
contradiction, suppose N is satisfiable. Then, there exists an interpretation Z 
with Z = N and therefore it holds that Z = N; and Z — Nə. Furthermore, by 
definition of a conflict literal, Z = L and ZT — comp(ZL), a contradiction. 

“<=” Given an unsatisfiable clause set N, we show that there is a conflict literal 
in N. Since N is unsatisfiable, by compactness of first-order logic there is a 
minimal set of ground instances N’ from N that is also unsatisfiable. The rest 
follows from Lemma 13. 


Intuitively, a clause that is implied by other clauses is redundant and can be 
removed from the set of clauses. However, then applying a calculus generating 
new clauses, this intuitive notion of redundancy may destroy completeness [2, 23]. 
Still, the detection and elimination of redundant clauses, compatible or incom- 
patible with completeness, is an important concept to the efficiency of automatic 
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reasoning, e.g., in propositional logic [3,18]. It is also apparently important when 
we try to define a semantic notion of relevance. For example, a syntactically rele- 
vant clause would step down to be syntactically semi-relevant if it is duplicated. 
So, in order to have a semantically robust notion of relevance in first-order logic, 
we need to use a strong notion of (in)dependency. 


Definition 15 (Dependency). A clause C is dependent in N if there exists 
a satisfiable set of instances N’ from N \ {C} such that N’ = Co for some o. If 
C is not dependent in N it is independent in N. A clause set N is independent 
if it does not contain any dependent clauses. 


A subsumed clause is obviously a dependent clause. However, there could 
also be non-subsumed clauses that are dependent. For example, in the set of 
clauses 

N= {P(a, y), P(x,b), =P(a,b)} 


P(x,b) is dependent because P(a,b) is an instance of P(x,b) and it is entailed 
by P(a,y). Now, we are ready to define the semantic notion of relevance based 
on conflict literals and dependency. 

In some way, our notion of independence of clause sets is a strong assumption 
because there might be non-redundant clauses that are considered dependent. 
While this holds by design in some scenarios (e.g. the mentioned car scenario) 
in others it is violated by design. In addition, one question that may arise is how 
to acquire an independent clause set out of a dependent one. For example, in a 
scenario where some theory is developed out of some independent axioms. Then 
of course proven lemmas, theorems are dependent with respect to the axioms. In 
this case one could trace out of the proofs the dependency relations between the 
intermediate lemmas, theorems and the axioms and this way calculate indepen- 
dent clause sets with respect to some proven conjecture. This would then lead 
again to independent (sub) clause sets with respect to the proven conjecture 
where our results are applicable. 


Definition 16 (Semantic Relevance). Given an unsatisfiable set of inde- 
pendent clauses N, a clause C E N is 


1. relevant, if conflict(N \ {C} = 0 
2. semi-relevant, if conflict(N \ {C}) € conflict( N) 
3. irrelevant, if conflict(N \ {C}) = conflict( N) 


Example 17 (Dependent Clauses in Propositional Logic). 
N=({P,-P, 
aPVQ,ARYV P, 
QVR} 
The existence of dependent clauses ~P V Q and ~R V P causes an independent 


clause =Q V R to be a semi-relevant clause. However, =Q V R is not inside the 
only MUS {P, =P}. 


Semantic Relevance 221 


Very often, concepts from propositional logic can be generalized to first-order 
logic. However, in the context of relevance this is not the case. Our notion of 
(semi-)relevance can also be characterized by MUSes in propositional logic, but 
not in first-order logic without considering instances of clauses. 


Lemma 18 (Propositional Clause Sets and Relevance). Given an inde- 
pendent unsatisfiable set of propositional clauses N, the relevant clauses coincide 
with the intersection of all MUSes and the semi-relevant clauses coincide with 
the union of all MUSes. 


Proof For the case of relevance: Given C € N, C is relevant if and only if 
conflict(V \ {C}) = @ if and only if N \ {C} is satisfiable by Lemma 14 if and 
only if C is contained in all MUSes N’ of N. 


For the case of semi-relevance: Given C € N, we show C is semi-relevant if and 
only if C is in some MUS N'CN. 


“=>”: Towards contradiction, suppose there is a semi-relevant clause C that is 
not in any MUS. By definition of semi-relevant clauses, there are satisfiable 
sets Nı and Nə and a propositional variable P such that Ni = P, No = =P 
but the MUS M out of Nı U No does not contain C. By Theorem 2 there 
exist deductions 7, and m2 of P and =P from Nj, and No, respectively. Since a 
deduction is connected, some clauses in M and (N1 U N2) \ M must have some 
complementary propositional literals Q and ~Q, respectively to be eventually 
resolved upon in either mı or 72. At least one of these deductions must contain 
this resolution step between a clause from M and one from (N1 U N2) \ M. Now 
by Lemma 13 the literals Q and ~Q are conflict literals in M. Thus, there are 
satisfiable subsets from M which entail Q and ~Q, respectively. Therefore, the 
clause containing Q or =Q in (Ny U N2) \ M is dependent contradicting the 
assumption that N does not contain dependent clauses. 


“<=”: If C is in some MUS N’ C N, then, N’ \ {C} is satisfiable. So invoking 
Lemma 13 any literal L € C is a conflict literal in N’. In addition, L is not a 
conflict literal in N \ {C} for otherwise C is dependent: Suppose L is a conflict 
literal in N \ {C} then, by definition, there is satisfiable subset from N \ {C} 
which entails L. However, since L = C, it means C is dependent. 


The next example demonstrates that the notion of a MUS cannot be carried 
over straightforwardly to the level of clauses with variables to characterize semi- 
relevant clauses in first-order logic. 


Example 19 (First-Order Relevant Clauses). Given a set of clauses 


N= {P(a,y), ~P(a, d) Vv Q(b, d), 
=P(x,c), 7Q(b, d) V P(d,c), Q(z, e)} 


over X = ({a,b, c,d, e}, {P, Q}). The conflict literals are 


{(-)P(a, c), (=)Q(b, d), (=)P(d, c), (>) P(a, d)}. 
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The clause P(a, y) is relevant. The literals entailed by some satisfiable instances 
N’ from N such that P(a,y) ¢ N’ are {>Q(b,d)} © {AP(t,c), =Q(t,e) | 
t € {a,b,c,d,e}} and no two of them are complementary. Thus, conflict( N \ 
{P(a,y)}) = 0. The clause —P(a,d) V Q(b,d) is semi-relevant: Q(b,d) ¢ 
conflict( N \ {=P(a,d) V Q(b, d)}). The clause Q(z, e) is irrelevant. 


With respect to a MUS, the clause =P(a,d) V Q(b, d) from Example 19 is 
irrelevant. The only MUS from N is {P(a, y), >P(a,c)} with grounding substi- 
tution {x +> a,y ++ c}. However, in first-order logic we should not ignore the 
clauses ~P (a, d) V Q(b, d), ~Q(b, d) V P(d, c), because together with the clauses 
P(a, y), =P(x,c) they result in a different grounding {x + d,y +> d}. So, we 
argue that MUS-based (semi-)relevance on the original clause set is not suffi- 
cient to characterize the way clauses are used to derive a contradiction for full 
first-order logic. However, it does so if ground instances are considered. 


Lemma 20 (Relevance and MUSes on First-Order Clauses). Given an 
unsatisfiable set of independent first-order clauses N. Then a clause C is relevant 
in N, if all MUSes of unsatisfiable sets of ground instances from N contain a 
ground instance of C. The clause C is semi-relevant in N, if there exists a 
MUS of an unsatisfiable set of ground instances from N that contains a ground 
instance of C. 


Proof (Relevance) Since all ground instances from N contain a ground instance 
of C, then, if N \ {C} contains a ground MUS from N it means that some 
ground instance of C is entailed by N \ {C}. This violates our assumption that 
N contains no dependent clauses. Thus, V\{C} contains no ground MUSes. This 
further means that N \ {C} is satisfiable by the compactness theorem of first- 
order logic. By Lemma 14 it therefore has no conflict literals and C is relevant. 
(Semi-Relevance) Take some ground MUS M containing some ground instance 
C’ of C. Due to Lemma 13, any literal P € C” is a conflict literal in M and 
consequently also in N. In addition, P is not a conflict literal in N \ {C} for 
otherwise C is dependent: Suppose P is a conflict literal in N \ {C}. Then, 
by definition, there is some satisfiable instances from N \ {C} which entails P. 
However, since P — C’, it means C is dependent. In conclusion, P € conflict(V)\ 
conflict(N \ {C}) and thus C is semi-relevant. 


In Example 19, we could identify two ground MUSes: 
{P(a,c), ~P(a,c)} 


and 
{P(a, d), =P (a, d) V Q(b, d), =P (d, c), a>Q(b, d) V P(d,c)} 


Our notion of relevance is thus alternatively explainable using Lemma 20: P(a, y) 
is relevant because every MUS contains an instance of it (P(a,c) and P(a,d)). 
The clause —P(a, d)VQ(b, d) is semi-relevant as it is immediately contained in the 
second MUS. The clause Q(z, e) is irrelevant since no MUS contains any instance 
of Q(z, e). On the other hand, we may still encounter the case where a dependent 
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clause is actually categorized as syntactically semi-relevant. Therefore, by using 
the dependency notion while at the same time not restricting a refutation to only 
use MUS as the input set, we can show that (semi-)relevance actually coincides 
with the syntactic (semi-)relevance. So, the semi-decidability result also follows. 


Theorem 21 (Semantic versus Syntactic Relevance). Given an inde- 
pendent, unsatisfiable set of clauses N in first-order logic, then (semi)-relevant 
clauses coincide with syntactically (semi)-relevant clauses. 


Proof We show the following: if N contains no dependent clause, C is (semi-) 
relevant if and only if C is syntactically (semi-)relevant. The case for relevant 
clauses is a consequence of Lemma 14. Now, we show it for semi-relevant clauses. 
“>” Let L be a ground literal with L € conflict(N) \ conflict(N \ {C}). We 
can construct a refutation using C. There are two satisfiable subsets of instances 
Nı, Na from N such that Ni = L and Nə | comp(L) where N1 U N2 contains at 
least one instance of C, for otherwise L ¢ conflict( N) \ conflict(N \ {C}). By the 
deductive completeness, Theorem 2, and the fact that L and comp(L) are ground 
literals, there are two variable disjoint deductions 7, and 72 of some literals 
Kı and Ky such that Kio = L and Koo = comp(L) for some grounding v. 
Obviously, the two variable disjoint deductions can be combined to a refutation 
771.7. containing C. Thus, C is syntactically semi-relevant in N. 


“<=” Given an SOS refutation m using C, i.e., an SOS refutation m from 
N\{C} with SOS {C} and overall grounding substitution øo, we show that C is 
semantically semi-relevant. Let N’ be the variable renamed versions of clauses 
from N \ {C} used in the refutation and S’ be the renamed copies of C used 
in the refutation. First, we show that N’ø is satisfiable. Towards contradiction, 
suppose N’o is unsatisfiable and let Moa C N'o be its MUS. Since v is connected, 
some clauses in Mo and S'o U (N'o \ Mo) contains literals L and comp(L) 
respectively. By Lemma 13, L and comp(L) are also conflict literals in Mo. So, 
by Definition 15, the clause containing comp(L) in S’aU(N’o\ Mo) is dependent 
violating our initial assumption. 

Now, since N’c is satisfiable, there is a ground MUS from (N’ U S$”)o con- 
taining some C'a € So. Due to Lemma 13, any L € C’o is a conflict literal 
in N’ (and consequently also in N). In addition, L is not a conflict literal in 
N\{C} for otherwise C is dependent: Suppose L is a conflict literal in N \ {C}. 
Then, by definition, there is some satisfiable instances from N \ {C} which 
entails L. However, since L = C’o, it means C is dependent. In conclusion, 
L € conflict(V) \ conflict( N \ {C}) and thus C is semi-relevant. 


When we have a ground MUS, identifification of conflict literals is obvious 
because all of the literals in it are. However, testing if a literal L is a conflict 
literal is not trivial, in general. One can try enumerating all MUSes and check if 
L is contained in some. This definitely works for propositional logic despite being 
computationally expensive. In first-order logic, this is problematic because there 
could potentially be an infinite number of MUSes and determining a MUS is not 
even semi-decidable, in general. The following lemma provides a semi-decidable 
test using the SOS strategy. 
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Lemma 22 Given a ground literal L and an unsatisfiable set of clauses N with 
no dependent clauses, L is a conflict literal if and only if there is an SOS refu- 
tation from (N,{L V comp(L)}). 


Proof “=” By the deductive completeness, Theorem 2, and the fact that L 
and comp(L) are ground literals, there are two variable disjoint deductions mı 
and 72 of some literals Kı and Kə such that Kio = L and Kao = comp(L) 
for some grounding ø. Obviously, the two variable disjoint deductions can be 
combined to a refutation 71.72.L. We can then construct a refutation 71.72.(LV 
—=L).(comp(L)).. where K is resolved with L V comp(L) to get comp(L) which 
will be resolved with K, from 7, to get L. By Theorem 7, it means there is an 
SOS refutation from (N, {ZV aL}) 

“<=” Given an SOS refutation 7 using {LV comp(L)}, i.e., an SOS refutation 
t from N\{{LVcomp(L)}} with SOS {{L V comp(L)}}, Let N’ be the variable 
renamed versions of clauses from N and overall grounding substitution o. N'o 
is a MUS for otherwise there is a dependent clause: Suppose N’o \ M is an 
MUS where M is non-empty. Since m is connected, some clause D’ in M must 
be resolved with some D € N'o upon some literal K. Thus, by Lemma 13, K 
and comp(K) are also conflict literals in N’o \ M. So, by Definition 15, the 
clause subsuming D’ in N is dependent violating our initial assumption. Finally, 
because L occurs in N'o and N’o is an MUS, by Lemma 13, L is a conflict 
literal. 


4 Conclusion 


The main results of this paper are: (i) a semantic notion of relevance based on the 
existence of conflict literals, Definition 10, and Definition 16, (ii) its relationship 
to syntactic relevance, namely, both notions coincide for independent clause 
sets, Theorem 21, and (iii) the relationship of semantic relevance to minimal 
unsatisfiable sets, MUSes, both for propositional logic, Lemma 18, and first- 
order logic, Lemma 20. 

The semantic relevance notion sheds some further light on the way clauses 
may contribute to a refutation beyond what can be offered by the notion of 
MUSes. While the syntactic notion of semi-relevance also considers redundant 
clauses such as tautologies to be semi-relevant, the semantic notion rules out 
redundant clauses. Here, the notions only coincide for independent clause sets. 
Still, the syntactic notion is “easier” to test and there are applications where 
clause sets do not contain implied clauses by construction. Hence, the syntactic- 
relevance coincides with semantic relevance. For example, first-order toolbox 
formalizations have this property because every tool is formalized by its own 
distinct predicate. Still a goal, refutation, can be reached by the use of different 
tools. The classic example is the toolbox for car/truck/tractor building [8,31]. 
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Abstract. We propose a new calculus SCL(EQ) for first-order logic 
with equality that only learns non-redundant clauses. Following the idea 
of CDCL (Conflict Driven Clause Learning) and SCL (Clause Learning 
from Simple Models) a ground literal model assumption is used to guide 
inferences that are then guaranteed to be non-redundant. Redundancy 
is defined with respect to a dynamically changing ordering derived from 
the ground literal model assumption. We prove SCL(EQ) sound and 
complete and provide examples where our calculus improves on super- 
position. 


Keywords: First-order logic with equality - Term rewriting - 
Model-based reasoning 


1 Introduction 


There has been extensive research on sound and complete calculi for first-order 
logic with equality. The current prime calculus is superposition [2], where order- 
ing restrictions guide paramodulation inferences and an abstract redundancy 
notion enables a number of clause simplification and deletion mechanisms, such 
as rewriting or subsumption. Still this “syntactic” form of superposition infers 
many redundant clauses. The completeness proof of superposition provides a 
“semantic” way of generating only non-redundant clauses, however, the under- 
lying ground model assumption cannot be effectively computed in general [31]. It 
requires an ordered enumeration of infinitely many ground instances of the given 
clause set, in general. Our calculus overcomes this issue by providing an effective 
way of generating ground model assumptions that then guarantee non-redundant 
inferences on the original clauses with variables. 

The underlying ordering is based on the order of ground literals in the model 
assumption, hence changes during a run of the calculus. It incorporates a stan- 
dard rewrite ordering. For practical redundancy criteria this means that both 
rewriting and redundancy notions that are based on literal subset relations are 
permitted to dynamically simplify or eliminate clauses. Newly generated clauses 
are non-redundant, so redundancy tests are only needed backwards. Further- 
more, the ordering is automatically generated by the structure of the clause set. 


© The Author(s) 2022 
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Instead of a fixed ordering as done in the superposition case, the calculus finds 
and changes an ordering according to the currently easiest way to make progress, 
analogous to CDCL (Conflict Driven Clause Learning) [11, 21,25, 29,34]. 

Typical for CDCL and SCL (Clause Learning from Simple Models) [1,14, 18] 
approaches to reasoning, the development of a model assumption is done by deci- 
sions and propagations. A decision guesses a ground literal to be true whereas 
a propagation concludes the truth of a ground literal through an otherwise false 
clause. While propagations in CDCL and propositional logic are restricted to 
the finite number of propositional variables, in first-order logic there can already 
be infinite propagation sequences [18]. In order to overcome this issue, model 
assumptions in SCL(EQ) are at any point in time restricted to a finite number 
of ground literals, hence to a finite number of ground instances of the clause set 
at hand. Therefore, without increasing the number of considered ground literals, 
the calculus either finds a refutation or runs into a stuck state where the current 
model assumption satisfies the finite number of ground instances. In this case 
one can check whether the model assumption can be generalized to a model 
assumption of the overall clause set or the information of the stuck state can 
be used to appropriately increase the number of considered ground literals and 
continue search for a refutation. SCL(EQ) does not require exhaustive propaga- 
tion, in general, it just forbids the decision of the complement of a literal that 
could otherwise be propagated. 

For an example of SCL(EQ) inferring clauses, consider the three first-order 
clauses 


Cı := h(x) x g(a) Vexd Co := f(x) ~ g(x) Vaxb 
C3 := f(x) # h(x) V f(x) # g(x) 


with a Knuth-Bendix Ordering (KBO), unique weight 1, and precedence d < 
c<b<a<g< hx f. A Superposition Left [2] inference between C2 and C3 
results in 

C, := h(x) # g(x) V f(x) 4 g(x) V a ~ b. 


For SCL(EQ) we start by building a partial model assumption, called a trail, 
with two decisions 


T := (h(a) & gla) OS gl)Vha)#gla)) a HOR gla)” @)~9(@)V F(@) #9(2))-0) 
where ø := {x + a}. Decisions and propagations are always ground instances 
of literals from the first-order clauses, and are annotated with a level and a 
justification clause, in case of a decision a tautology. Now with respect to I’ clause 
C3 is false with grounding ø, and rule Conflict is applicable; see Sect.3.1 for 
details on the inference rules. In general, clauses and justifications are considered 
variable disjoint, but for simplicity of the presentation of this example, we repeat 
variable names here as long as the same ground substitution is shared. The 
maximal literal in C30 is (f(x) % h(x))o and a rewrite refutation using the 
ground equations from the trail results in the justification clause 


(g(x) # g(a) V F(x) # glx) V F(x) # g(a) V h(x) # g(2))-0 
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where for the refutation justification clauses and all otherwise inferred clauses 
we use the grounding o for guidance, but operate on the clauses with variables. 
The respective ground clause is smaller than (f(a) % h(x))o, false with respect 
to I’ and becomes our new conflict clause by an application of our inference rule 
Explore-Refutation. It is simplified by our inference rules Equality-Resolution 
and Factorize, resulting in the finally learned clause 


C4 := h(x) # g(x) V f(a) # g(x) 


which is then used to apply rule Backtrack to the trail. Observe that C4 is 
strictly stronger than C4 the clause inferred by superposition and that C4 cannot 
be inferred by superposition. Thus SCL(EQ) can infer stronger clauses than 
superposition for this example. 


Related Work: SCL(EQ) is based on ideas of SCL [1,14,18] but for the first time 
includes a native treatment of first-order equality reasoning. Similar to [14] prop- 
agations need not to be exhaustively applied, the trail is built out of decisions 
and propagations of ground literals annotated by first-order clauses, SCL(EQ) 
only learns non-redundant clauses, but for the first time conflicts resulting out 
of a decision have to be considered, due to the nature of the equality relation. 

There have been suggested several approaches to lift the idea of an inference 
guiding model assumption from propositional to full first-order logic [6, 12, 13,18]. 
They do not provide a native treatment of equality, e.g., via paramodulation or 
rewriting. 

Baumgartner et al. describe multiple calculi that handle equality by using 
unit superposition style inference rules and are based on either hyper tableaux [5] 
or DPLL [15,16]. Hyper tableaux fix a major problem of the well-known free 
variable tableaux, namely the fact that free variables within the tableau are 
rigid, i.e., substitutions have to be applied to all occurrences of a free variable 
within the entire tableau. Hyper tableaux with equality [7] in turn integrates 
unit superposition style inference rules into the hyper tableau calculus. 

Another approach that is related to ours is the model evolution calculus with 
equality (MEg) by Baumgartner et al. [8,9] which lifts the DPLL calculus to 
first-order logic with equality. Similar to our approach, MEe creates a candidate 
model until a clause instance contradicts this model or all instances are satisfied 
by the model. The candidate model results from a so-called context, which con- 
sists of a finite set of non-ground rewrite literals. Roughly speaking, a context 
literal specifies the truth value of all its ground instances unless a more specific 
literal specifies the complement. Initially the model satisfies the identity relation 
over the set of all ground terms. Literals within a context may be universal or 
parametric, where universal literals guarantee all its ground instances to be true. 
If a clause contradicts the current model, it is repaired by a non-deterministic 
split which adds a parametric literal to the current model. If the added literal 
does not share any variables in the contradictory clause it is added as a universal 
literal. 

Another approach by Baumgartner and Waldmann [10] combined the super- 
position calculus with the Model Evolution calculus with equality. In this cal- 
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culus the atoms of the clauses are labeled as “split atoms” or “superposition 
atoms”. The superposition part of the calculus then generates a model for the 
superposition atoms while the model evolution part generates a model for the 
split atoms. Conversely, this means that if all atoms are labeled as “split atom”, 
the calculus behaves similar to the model evolution calculus. If all atoms are 
labeled as “superposition atom”, it behaves like the superposition calculus. 

Both the hyper tableaux calculus with equality and the model evolution cal- 
culus with equality allow only unit superposition applications, while SCL(EQ) 
inferences are guided paramodulation inferences on clauses of arbitrary length. 
The model evolution calculus with equality was revised and implemented in 
2011 [8] and compares its performance with that of hyper tableaux. Model evo- 
lution performed significantly better, with more problems solved in all relevant 
TPTP [30] categories, than the implementation of the hyper tableaux calculus. 

Plaisted et al. [27] present the Ordered Semantic Hyper-Linking (OSHL) cal- 
culus. OSHL is an instantiation based approach that repeatedly chooses ground 
instances of a non-ground input clause set such that the current model does not 
satisfy the current ground clause set. A further step repairs the current model 
such that it satisfies the ground clause set again. The algorithm terminates if 
the set of ground clauses contains the empty clause. OSHL supports rewriting 
and narrowing, but only with unit clauses. In order to handle non-unit clauses 
it makes use of other mechanisms such as Brand’s Transformation [3]. 

Inst-Gen [22] is an instantiation based calculus, that creates ground instances 
of the input first-order formulas which are forwarded to a SAT solver. If a ground 
instance is unsatisfiable, then the first-order set is as well. If not then the cal- 
culus creates more instances. The Inst-Gen-EQ calculus [23] creates instances 
by extracting instantiations of unit superposition refutations of selected liter- 
als of the first-order clause set. The ground abstraction is then extended by the 
extracted clauses and an SMT solver then checks the satisfiability of the resulting 
set of equational and non-equational ground literals. 

In favor of examples and explanations we omit all proofs. They are available 
in an extended version published as a research report [24]. The rest of the paper 
is organized as follows. Section 2 provides basic formalisms underlying SCL(EQ). 
The rules of the calculus are presented in Sect.3. Soundness and completeness 
results are provided in Sect. 4. We end with a discussion of obtained results and 
future work, Sect.5. The main contribution of this paper is the SCL(EQ) cal- 
culus that only learns non-redundant clauses, permits subset based redundancy 
elimination and rewriting, and its soundness and completeness. 


2 Preliminaries 


We assume a standard first-order language with equality and signature X = 
(2,0) where the only predicate symbol is equality ~. N denotes a set of clauses, 
C, D denote clauses, L, K, H denote equational literals, A, B denote equational 
atoms, t,s terms from T(2, X) for an infinite set of variables V, f, g, h function 
symbols from (2, a,b,c constants from 2 and x,y,z variables from X. The func- 
tion comp denotes the complement of a literal. We write s Æ t as a shortcut for 
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-(s ~ t). The literal s #t may denote both s ~ t and s # t. The semantics of 
first-order logic and semantic entailment | is defined as usual. 

By o,7, 6 we denote substitutions, which are total mappings from variables to 
terms. Let ø be a substitution, then its finite domain is defined as dom(o) := {< | 
xo Æ x} and its codomain is defined as codom(c) = {t | xo = t,x € dom(o)}. 
We extend their application to literals, clauses and sets of such objects in the 
usual way. A term, literal, clause or sets of these objects is ground if it does 
not contain any variable. A substitution o is ground if codom(c) is ground. A 
substitution ø is grounding for a term t, literal L, clause C if to, Lo, Ca is 
ground, respectively. By C- a, L-o we denote a closure consisting of a clause C, 
literal L and a grounding substitution ø, respectively. The function gnd computes 
the set of all ground instances of a literal, clause, or clause set. The function mgu 
denotes the most general unifier of terms, atoms, literals, respectively. We assume 
that mgus do not introduce fresh variables and that they are idempotent. 

The set of positions pos(L) of a literal (term pos(t)) is inductively defined as 
usual. The notion L|, denotes the subterm of a literal L (¢|, for term t) at position 
p E€ pos(L) (p € pos(t)). The replacement of a subterm of a literal L (term t) 
at position p E€ pos(L) (p € pos(t)) by a term s is denoted by L{s], (¢[s],). For 
example, the term f(a, g(x)) has the positions {e€, 1,2,21}, f(a, g(x))|21 = z and 
f(a, 9(x))[b]2 denotes the term f(a, bd). 

Let R be a set of rewrite rules 1 — r, called a term rewrite system (TRS). 
The rewrite relation >rC T(R, X) x T(2, X) is defined as usual by s >p t if 
there exists (l — r) € R, p E€ pos(s), and a matcher ø, such that s|, = lo and 
t = s[ro|,. We write s = t} p if s is the normal form of t in the rewrite relation 
—>pr. We write s #t = (s’ #1’) | pz if s is the normal form of s’ and t is the normal 
form of t’. A rewrite relation is terminating if there is no infinite descending chain 
to > tı > ... and confluent if t + s —* t implies t +>* ¢’. A rewrite relation is 
convergent if it is terminating and confluent. A rewrite order is a irreflexive and 
transitive rewrite relation. A TRS R is terminating, confluent, convergent, if the 
rewrite relation —> p is terminating, confluent, convergent, respectively. A term t 
is called irreducible by a TRS R if no rule from R rewrites t. Otherwise it is called 
reducible. A literal, clause is irreducible if all of its terms are irreducible, and 
reducible otherwise. A substitution ø is called irreducible if any t E€ codom(c) is 
irreducible, and reducible otherwise. 

Let <r denote a well-founded rewrite ordering on terms which is total on 
ground terms and for all ground terms ¢ there exist only finitely many ground 
terms s <r t. We call <r a desired term ordering. We extend <r to equations by 
assigning the multiset {s,t} to positive equations s ~ t and {s,s,t,t} to inequa- 
tions s % t. Furthermore, we identify <r with its multiset extension comparing 
multisets of literals. For a (multi)set of terms {t;,...,t,} and a term t, we define 
{ti,...,tn} <r tif {t1,... tn} <r {t}. For a (multi)set of Literals {Z1,..., Ln} 
and a term t, we define {L1,..., Ln} <r t if {Li,..., Ln} <r {{t}}. Given a 
ground term 8 then gnd_,,.g computes the set of all ground instances of a lit- 
eral, clause, or clause set where the groundings are smaller than ( according to 
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the ordering <r. Given a set (sequence) of ground literals I’ let conv(I’) be a 
convergent rewrite system out of the positive equations in I’ using <r. 

Let < be a well-founded, total, strict ordering on ground literals, which is 
lifted to clauses and clause sets by its respective multiset extension. We overload 
< for literals, clauses, clause sets if the meaning is clear from the context. The 
ordering is lifted to the non-ground case via instantiation: we define C < D 
if for all grounding substitutions ø it holds Co < Do. Then we define < as 
the reflexive closure of < and N=° := {D | D € N and D < C} and use the 
standard superposition style notion of redundancy [2]. 


Definition 1 (Clause Redundancy). A ground clause C is redundant with 
respect to a set N of ground clauses and an ordering < if N=° EC. A clause 
C is redundant with respect to a clause set N and an ordering < if for all 
C’ € gnd(C), C” is redundant with respect to gnd(N). 


3 The SCL(EQ) Calculus 


We start the introduction of the calculus by defining the ingredients of an 
SCL(EQ) state. 


Definition 2 (Trail). A trail P := [L Ona... LinCa 9n] is a consistent 
sequence of ground equations and inequations where L; is annotated by a level 
ij with ij—ı < ij, and a closure Cj: oj. We omit the annotations if they are not 
needed in a certain context. A ground literal L is true in T if I |} L. A ground lit- 
eral L is false in I if I |} comp(L). A ground literal L is undefined in I if I A L 
and I |F comp(L). Otherwise it is defined. For each literal L; in I’ it holds that 
Lj is undefined in (Ly, ..., Lj—1] and irreducible by conv({ L1, ..., Lj—1}). 


The above definition of truth and undefinedness is extended to clauses in the 
obvious way. The notions of true, false, undefined can be parameterized by a 
ground term 8 by saying that L is -undefined in a trail I if 8 <r L or L is 
undefined. The notions of a (-true, -false term are restrictions of the above 
notions to literals smaller 6, respectively. All SCL(EQ) reasoning is layered with 
respect to a ground term £. 


Definition 3. Let I be a trail and L a ground literal such that L is defined in 
I’. By core(I’; L) we denote a minimal subsequence I’ C T such that L is defined 
in I". By cores(I’; L) we denote the set of all cores. 


Note that core(I’; L) is not necessarily unique. There can be multiple cores 
for a given trail I’ and ground literal L. 


Definition 4 (Trail Ordering). Let I := [Ly,..., Ln] be a trail. The (partial) 
trail ordering <r is the sequence ordering given by I, i.e., Li <r Lj tft < j for 
alll <i,j <n. 
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Definition 5 (Defining Core and Defining Literal). For a trail I and 
a sequence of literals A C I we write mazz, (A) for the largest literal in A 
according to the trail ordering <r. Let I’ be a trail and L a ground literal such 
that L is defined in I. Let A € cores(I[;L) be a sequence of literals where 
mazz, (^) <p max.,(A) for all A € cores(I’; L), then maxp(L) := maxz, (4) 
is called the defining literal and A is called a defining core for L in I’. If 
cores(I’; L) contains only the empty core, then L has no defining literal and 
no defining core. 


Note that there can be multiple defining cores but only one defining literal for 
any defined literal L. For example, consider a trail  := [f(a) ~ f(b)!Or™, a & 
p2:C2-02 b x ¢#C3°3] with an ordering <r that orders the terms of the equations 
from left to right, and a literal g(f(a)) ~ g(f(c)). Then the defining cores are 
A, := [ax b,b ~ c| and Ag := [f(a) = f(b), b ~ c|. The defining literal, however, 
is in both cases b ~ c. Defined literals that have no defining core and therefore no 
defining literal are literals that are trivially false or true. Consider, for example, 
g(f(a)) ~ g(f(a)). This literal is trivially true in I’. Thus an empty subset of I 
is sufficient to show that g(f(a)) ~ g(f(a)) is defined in I. 


Definition 6 (Literal Level). Let I be a trail. A ground literal L € I is of 
level i if L is annotated with i in I’. A defined ground literal L ¢ I’ is of level i 
if the defining literal of L is of level i. If L has no defining literal, then L is of 
level 0. A ground clause D is of level i if i is the maximum level of a literal in 
D. 


The restriction to minimal subsequences for the defining literal and defini- 
tion of a level eventually guarantee that learned clauses are smaller in the trail 
ordering. This enables completeness in combination with learning non-redundant 
clauses as shown later. 


Lemma 7. Let I, be a trail and K a defined literal that is of leveli in I. Then 
K is of level i in a trail I := I, T>. 


Definition 8. Let I be a trail and L € T a literal. L is called a decision literal 
if T = Ih, KOT, Li+t:C T T}. Otherwise L is called a propagated literal. 


In our above example g(f(a)) ~ g(f(c)) is of level 3 since the defining literal 
= c is annotated with 3. a # b on the other hand is of level 2. 

We define a well-founded total strict ordering which is induced by the trail 
and with which non-redundancy is proven in Sect.4. Unlike SCL [14,18] we 
use this ordering for the inference rules as well. In previous SCL calculi, conflict 
resolution automatically chooses the greatest literal and resolves with this literal. 
In SCL(EQ) this is generalized. Coming back to our running example above, 
suppose we have a conflict clause f(b) # f(c)Vb # c. The defining literal for both 
inequations is b ~ c. So we could do paramodulation inferences with both literals. 
The following ordering makes this non-deterministic choice deterministic. 
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Definition 9 (Trail Induced Ordering). Let I := [LINO]... LinCon] 
be a trail, B a ground term such that {L;,..., Ln} <r 8 and Mi; all 8- 
defined ground literals not contained in I U comp(L): for a defining literal 
mazr(M; j) = Li and for two literals M; j, Mi, we have j < k if Mij <r Mix. 
The trail induces a total well-founded strict order <r» on B-defined ground lit- 
erals Mk i, Mm,n, Li, Lj of level greater than zero, where 


. M;, wa a a 
. Li ira L; if Li <r Lj 

. comp(L;) <r» L; if Li <r Lj 

. Li <r» comp(L;) if Li <p Lj ori=j 

. comp( Li) <r» compel j) if Li <r Lj 

Li <r» Mpk ı, comp(L; j <r» My ifi<k 

© Mpa <r» Li, Me <r» comp(L;) if k <i 


and for all 3-defined literals L of level zero: 


RA AK WH 


8. <p =<T 
9. L <r» K if K is of level greater than zero and K is 3-defined 


and can eventually be extended to B-undefined ground literals K, H by 


10. K <r. H if K <r H 
11. Lp» H if L is B-defined 


The literal ordering <r» is extended to ground clauses by multiset extension and 
identified with <r» as well. 


Lemma 10 (Properties of <r»). 


1. <r» is well-defined. 
2. <r» is a total strict order, i.e. <r» is irreflexive, transitive and total. 
3. <r» is a well-founded ordering. 


Example 11. Assume a trail T := [a œ~ OEC0% e ~ dOa f(a’) œ% 
f(b) 102:2], select KBO as the term ordering <r where all symbols have weight 
one and a < a’ <b <b’ ¥c<dX~ fand a ground term 8 := f(f(a)). According 
to the trail induced ordering we have that a ~ b <p» cœ d <r- f(a’) # fŒ) 
by 9.2. Furthermore we have that 


axb<r a#gb<xr cxd<r c#d<r- f(a’) # f(b’) <r- f(a’) = fv) 


by 9.3 and9.4. Now for any literal L that is 6-defined in I and the defining 
literal is a ~ b it holds that a % b <p» L <r» c ~ d by 9.6 and9.7. This holds 
analogously for all literals that are G-defined in I’ and the defining literal is c ~ d 
or f(a’) % f(b’). Thus we get: 


Ly <r» <r- a~ bxr ab Xx f(a) © f(b) <r- f(a) # f(b) <r 
dare ele dare AORT se A) ge) 
f(a) Æ FO) <r fla’) © FY) r a! S r a! WS r Ky rnn 


where K; are the -undefined literals and L; are the trivially defined literals. 
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Definition 12 (Rewrite Step). A rewrite step is a five-tuple (s#t- o, s#t V 
C-.0,R,S,p) and inductively defined as follows. The tuple (s#t-o,s#t V 
C-0,€,€,€) is a rewrite step. Given rewrite steps R,S and a position p then 
(s#t- o, s#tV C- o, R,S,p) is a rewrite step. The literal s#t is called the rewrite 
literal. In case R, S are not e, the rewrite literal of R is an equation. 


Rewriting is one of the core features of our calculus. The following definition 
describes a rewrite inference between two clauses. Note that unlike the superpo- 
sition calculus we allow rewriting below variable level. 


Definition 13 (Rewrite Inference). Let h := (L ~% rror ~ m1 V 
Ci: O1, Ri, Lı, pı) and Ip = (lo#ra- 02; l2#r2 V C2- 02; Ra, L2, p2) be two variable 
disjoint rewrite steps where r101 <r 1101, (lo#1r2)02|p = Lior for some position 
p. We distinguish two cases: 


1. if p € pos(l2#r2) and u := mgu((l2#r2)|p, l1) then (((la#r2)[ri]p) Mm 0102, 
((lo#r2)[Ti]p) MV Ci pV Cou: 0102, 1), 12,p) is the result of a rewrite inference. 

2. ifp Z pos(lo#r2) then let (lo#1r2)d be the most general instance of lo#-r2 such 
that p E€ pos((lg#r2)d), 6 introduces only fresh variables and (lg#r2)d02p = 
(lo#1r2)o2 for some minimal p. Let u := mgu((le#r2)d|p,l). Then 
((lo#r2)d[ri|pu- 7102p, (lo#r2)d[rijp V Ciu V Codp-o102p, D, 12,p) is the 
result of a rewrite inference. 


Lemma 14. Let h := (h ~x rroo, © ri V Ci o1, Ri, Li, pi) and Ig := 
(l2#r2: 02, l2Ær2 V C2-02, R2, Lo, p2) be two variable disjoint rewrite steps 
where ria, <r loi, (l2#r2)o2|p = loi for some position p. Let Is := 
(l3#r3: 03, l3#r3 V C3: 03, I1, I2, p) be the result of a rewrite inference. Then: 


1. C303 = (C1 Vv C2)0102 and Is##r303 = (lo#1r2)o9[T101]p- 

2. (Is#r3)o3 <T (l2#r2)o2 

3. fN H (h ~ ri V C1) A (l2#r2 V C2) for some set of clauses N, then N & 
l3 #r3 Vv C3 


Now that we have defined rewrite inferences we can use them to define a 
reduction chain application and a refutation, which are sequences of rewrite 
steps. Intuitively speaking, a reduction chain application reduces a literal in a 
clause with literals in conv(I’) until it is irreducible. A refutation for a literal 
L that is -false in I for a given p, is a sequence of rewrite steps with literals 
in T, L such that L is inferred. Refutations for the literals of the conflict clause 
will be examined during conflict resolution by the rule Explore-Refutation. 


Definition 15 (Reduction Chain). Let I be a trail. A reduction chain P 

from I is a sequence of rewrite steps |I, ..., Im] such that for each I, = 

(siti di, si#ti V Cir oi, Ij, Ik, pi) either 

IR sipta Ora is contained in I and I; = Ik = pi = € or 

2. I; is the result of a rewriting inference from rewrite steps Ij, Iķ out of 
[Ii, ...,Im] where j,k < i. 
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Let (l#r)6°!#™VC® be an annotated ground literal. A reduction chain appli- 
cation from I to l#r is a reduction chain |I, ..., Im] from I, (l#r)d0! #1VC% 
such that lédlconv(r) = Smam and rôlconv(r) = tmam. We assume reduction 
chain applications to be minimal, i.e., if any rewrite step is removed from the 
sequence it is no longer a reduction chain application. 


Definition 16 (Refutation). Let I be a trail and (l#-r)6°!#"Y° an anno- 
tated ground literal that is G-false in I for a given 3. A refutation P from 
I andl#r is a reduction chain |I, ..., Im] from T, (l#r)6°!#'VO? such that 
(Sm#tm)Om = s # s for some s. We assume refutations to be minimal, i.e., 
if any rewrite step Ik, k < m is removed from the refutation, it is no longer a 
refutation. 


3.1 The SCL(EQ) Inference Rules 


We can now define the rules of our calculus based on the previous definitions. 
A state is a six-tuple (I; N; U; 8; k; D) similar to the SCL calculus, where I’ a 
sequence of annotated ground literals, N and U the sets of initial and learned 
clauses, @ is a ground term such that for all L € I it holds L <r £, k is 
the decision level, and D a status that is T, L or a closure C'-o. Before we 
propagate or decide any literal, we make sure that it is irreducible in the current 
trail. Together with the design of <p» this eventually enables rewriting as a 
simplification rule. 


Propagate 

(T; N; U; 8; k; T) >scræo) WU, smtm te Crh om ; N; U; 6; k; T) 
provided there is a C € (NUU), o grounding for C, C = CoV C1 V L, r E 7Coo, 
Cio = LoV...V Lo, Ci = Li V... V Ln, w= mgu( La, ..., Ln, L) Lo is -undefined 
in I, (Co V L)uo <r p, o is irreducible by conv(L), [,...,Im] is a reduction 
chain application from I to LoF ŒYCowe where Im = (Sm#tm' Om, Sm#tm V 
Cm: Om, Lj, Ik, Pm). 


Note that the definition of Propagate also includes the case where Lø is 
irreducible by I’. In this case L = sm#tm and m = 1. The rule Decide below, 
is similar to Propagate, except for the subclause Co which must be 8-undefined 
or -true in I, i.e., Propagate cannot be applied and the decision literal is 
annotated by a tautology. 


Decide 

(N: U Bk T) sorea (Dentimom mein Vemelemttin om: N, T 
Bik+1;T) 
provided there is a C € (N U U), o grounding for C, C = Co V L, Coo is 
B-undefined or B-true in T, Lo is B-undefined in T, (Co V L)o <r B, o is 
irreducible by conv(I`), |J, ..., Im] is a reduction chain application from I to 
Lot iN Coe where Im = (Sm#tm' 0m, Sm#tm V Cm‘ Om, Ij, Ik, Pm). 
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Conflict 

(I; N; U; 8; k; T)=scLæg) (T; N;U; 6; k; D) 

provided there is a D’ € (N U U), o grounding for D', D'o is -false in I’, o is 
irreducible by conv(I’), D = L if D'o is of level 0 and D = D'- ø otherwise. 


For the non-equational case, when a conflict clause is found by an SCL calcu- 
lus [14,18], the complements of its first-order ground literals are contained in the 
trail. For equational literals this is not the case, in general. The proof showing 
D to be -false with respect to I is a rewrite proof with respect to conv(T). 
This proof needs to be analyzed to eventually perform paramodulation steps on 
D or to replace D by a <r» smaller 8-false clause showing up in the proof. 


Skip 
(T, KOT, LFO" N; U; 83k; D-aj>somo (DRON; U; BGD-a) if 
Do is G-false in T, KOT. 


The Explore-Refutation rule is the FOL with Equality counterpart to the 
resolve rule in CDCL or SCL. While in CDCL or SCL complementary literals of 
the conflict clause are present on the trail and can directly be used for resolution 
steps, this needs a generalization for FOL with Equality. Here, in general, we need 
to look at (rewriting) refutations of the conflict clause and pick an appropriate 
clause from the refutation as the next conflict clause. 


Explore-Refutation 

(T, L; N; U; b; k; (DVs#t):-0))>scLæQ) (T, L; N; U; p; k; (sj#Æt;V C3) 03) 
if (s#t)o is strictly <p» maximal in (D V s#t)o, L is the defining literal of 
(s#t)o, [h,..., Im] is a refutation from I and (s#t)o, I; = (sj#t;: oj, (sj#t; V 
Cj): oj, l, Ik, pi), 1<j< nm, (sj #t; V Ci)o; <r (Dv s#t)o, (sj#tj V Cio; 
is G-false in I. 


Factorize 
(T; N; U; p;k; (DV LV L')-0)=screqy (T; N; U; p; k; (DV L)u-o) 
provided Lo = L’o, and u = mgu(L, L’). 


Equality-Resolution 
(T; N; U; 6; k; (D V s # s’) -o)=scrægo) (T; N;U;p;k; Du-o) 
provided so = s'o, p = mgu(s, s’). 


Backtrack 

(T, K, I"; N; U; 8; k; (D v L) -o)>scræo) (3 N;UU {DV L}; Bj- iT) 
provided Da is of level i’ where i’ < k, K is of level j and T, K the minimal trail 
subsequence such that there is a grounding substitution 7 with (D V L)r (-false 
in I,K but not in T; i = 1 if K is a decision literal and i = 0 otherwise. 


Grow 
([;.N;U;8;k;T)>scrmay (6.N;U; 6';0;T) 
provided 8 <r p. 
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In addition to soundness and completeness of the SCL(EQ) rules their 
tractability in practice is an important property for a successful implementa- 
tion. In particular, finding propagating literals or detecting a false clause under 
some grounding. It turns out that these operations are NP-complete, similar to 
first-order subsumption which has been shown to be tractable in practice. 


Lemma 17. Assume that all ground terms t with t <r b for any B are poly- 
nomial in the size of 3. Then testing Propagate (Conflict) is NP-Complete, i.e., 
the problem of checking for a given clause C whether there exists a grounding 
substitution o such that Co propagates (is false) is NP-Complete. 


Example 18 (SCL(EQ) vs. Superposition: Saturation). Consider the following 
clauses: 


N := {C01 := cx dV D, Ca := a ~ b V c # d, C3 := f(a) # f(b) V g(o) # g(d)} 


where again we assume a KBO with all symbols having weight one, precedence 
d<c~<x~b~x<xa-~g<f and 8 := f(f(g(a))). Suppose that we first decide 
cx d and then propagate a ~ b: I = [c = dh 4g ~ bt:O2], Now we have a 
conflict with C3. Explore-Refutation applied to the conflict clause C3 results in a 
paramodulation inference between C3 and C2. Another application of Equality- 
Resolution gives us the new conflict clause C4 := c % dV g(c) # g(d). Now we can 
Skip the last literal on the trail, which gives us I = [e ~ d'*4V°*4]_ Another 
application of the Explore-Refutation rule to C4 using the decision justification 
clause followed by Equality-Resolution and Factorize gives us Cs := c # d. Thus 
with SCL(EQ) the following clauses remain: 


Cl=D Cs=c#d 
Cs = f(a) # f(b) V g(c) # g(d) 


where we derived C4 out of C4 by subsumption resolution [33] using C5. Actually, 
subsumption resolution is compatible with the general redundancy notion of 
SCL(EQ), see Lemma 25. Now we consider the same example with superposition 
and the very same ordering (N; is the clause set of the previous step and No the 
initial clause set N). 


No = sSup(Co,C3) Ni U {Ca := c % dV g(c) # g(d)} 
= Sup(C1,C1) Nə U {Cs := C Æ% dV D} = Sup(C1,C5) N3 U {Co = D} 


Thus superposition ends up with the following clauses: 


Cp=anbVvegd C3 = f(a) # f(b) V gle) # g(d) 
Ca =c ÆdV g(c) # gld) Ce = D 


The superposition calculus generates more and larger clauses. 


Example 19 (SCL(EQ) vs. Superposition: Refutation). Suppose the following set 
of clauses: N := {C1 := f(x) #aV f(x) = b, C2 := f(f(y)) = y, C3 := a æ% b} 
where again we assume a KBO with all symbols having weight one, precedence 
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b<a~ fand 8 := f(f(f(a))). A long refutation by the superposition calculus 
results in the following (N; is the clause set of the previous step and No the 
initial clause set N): 


(01,02) N1 U {C4 :=y aV f(f(y)) © df 

= Sup(C1,C4) No U {C5 =a x bV f(f(y)) xbv y £ a} 
(C2,C5) N3 U {C6 =a#tbVoxryVy#at} 

= Sup(C2,Cs) N, U {C7 =Y X bv yY Æ a} 

= EqRes(C7) N; U {Cs =a 7 b} = Sup(C3,Cs) Ne U {1} 


The shortest refutation by the superposition calculus is as follows: 


No = Sup(C1,C2) Ni U {C4 := y Hav f(f(y)) = b} 
= sup(Co,C,) N2 U {Cs := y x bVy ¥ a} 
= EqRes(Cs) N3 U {C6 := a ~ b} > gup(cs,05) Na U {L} 


In SCL(EQ) on the other hand we would always first propagate a % b, f(f(a)) ~ 
a and f(f(b)) = b. As soon as a % b and f(f(a)) ~ a are propagated we have a 
conflict with C,{a — f(a)}. So suppose in the worst case we propagate: 


T := [a Z bP F(f(b)) & bE E F(a)) x a FEM) Satya} 


Now we have a conflict with C,{« — f(a)}. Since there is no decision literal on 
the trail, Conflict rule immediately returns L and we are done. 


4 Soundness and Completeness 


In this section we show soundness and refutational completeness of SCL(EQ) 
under the assumption of a regular run. We provide the definition of a regular run 
and show that for a regular run all learned clauses are non-redundant according 
to our trail induced ordering. We start with the definition of a sound state. 


Definition 20. A state (;N;U;8;k;D) is sound if the following conditions 
hold: 


1. I’ is a consistent sequence of annotated literals, 

2. for each decomposition I = Ty, La®\CY"4)-", Py where Lo is a propagated lit- 
eral, we have that Co is 3-false in I, Lo is B-undefined in I and irreducible 
by convo(I,), NUU E (CV L) and (CV L)o <r B, 

3. for each decomposition IT = Ty, Lo% eme) T, where Lo is a decision 
literal, we have that Lo is B-undefined in I, and irreducible by conv(T;), 
NUU H (LV comp(L)) and (LV comp(L))o <r 2, 

4. NEU, 

5. if D=C-o, then Co is -false in’, NUU EC, 


Lemma 21. The initial state (e; N; 0; 3;0;T) is sound. 


Definition 22. A run is a sequence of applications of SCL(EQ) rules starting 
from the initial state. 
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Theorem 23. Assume a state ([;N;U;(;k;D) resulting from a run. Then 
(T; N; U; 6;k; D) is sound. 


Next, we give the definition of a regular run. Intuitively speaking, in a regular 
run we are always allowed to do decisions except if 


1. a literal can be propagated before the first decision and 
2. the negation of a literal can be propagated. 


To ensure non-redundant learning we enforce at least one application of Skip 
during conflict resolution except for the special case of a conflict after a decision. 


Definition 24 (Regular Run). A run is called regular if 


1. the rules Conflict and Factorize have precedence over all other rules, 

2. If k = 0 in a state (I; N;U;{;k;D), then Propagate has precedence over 
Decide, 

3. If an annotated literal LC® could be added by an application of Propagate 
on T in a state (T; N; U; 6;k; D) and C € NUU, then the annotated literal 
comp(L)k+¥C"*" is not added by Decide on T, 

4. during conflict resolution Skip is applied at least once, except if Conflict is 
applied immediately after an application of Decide. 

5. if Conflict is applied immediately after an application of Decide, then Back- 
track is only applied in a state (T, L'; N;U;8;k;D-o) if Lo = comp(L’) for 
some LED. 


Now we show that any learned clause in a regular run is non-redundant 
according to our trail induced ordering. 


Lemma 25 (Non-Redundant Clause Learning). Let N be a clause set. 
The clauses learned during a regular run in SCL(EQ) are not redundant with 
respect to <p» and N UU. For the trail only non-redundant clauses need to be 
considered. 


The proof of Lemma 25 is based on the fact that conflict resolution eventually 

produces a clause smaller then the original conflict clause with respect to <p». 
All simplifications, e.g., contextual rewriting, as defined in [2,20,33,35-37], are 
therefore compatible with Lemma 25 and may be applied to the newly learned 
clause as long as they respect the induced trail ordering. In detail, let I’ be the 
trail before the application of rule Backtrack. The newly learned clause can be 
simplified according to the induced trail ordering <r» as long as the simplified 
clause is smaller with respect to <p». 
Another important consequence of Lemma 25 is that newly learned clauses 
need not to be considered for redundancy. Furthermore, the SCL(EQ) calculus 
always terminates, Lemma 33, because there only finitely many non-redundant 
clauses with respect to a fixed (. 

For dynamic redundancy, we have to consider the fact that the induced trail 
ordering changes. At this level, only redundancy criteria and simplifications that 
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are compatible with all induced trail orderings may be applied. Due to the 
construction of the induced trail ordering, it is compatible with <r for unit 
clauses. 


Lemma 26 (Unit Rewriting). Assume a state (T; N; U; B; k; D) resulting 
from a regular run where the current level k > 0 and a unit clausel sre N. 
Now assume a clause CV Ll], E N such that l’ = lu for some matcher u. Now 
assume some arbitrary grounding substitutions o' for CV L|l']p, o forl ~ r such 
that lo = l'o’ and ro <r lo. Then (C V L|ruoo']p)o <r» (CV L{I'},)o". 


In addition, any notion that is based on a literal subset relationship is also 
compatible with ordering changes. The standard example is subsumption. 


Lemma 27. Let C,D be two clauses. If there exists a substitution o such that 
Co C D, then D is redundant with respect to C and any <r». 


The notion of redundancy, Definition 1, only supports a strict subset relation 
for Lemma 27, similar to the superposition calculus. However, the newly gener- 
ated clauses of SCL(EQ) are the result of paramodulation inferences [28]. In a 
recent contribution to dynamic, abstract redundancy [32] it is shown that also 
the non-strict subset relation in Lemma 27, i.e., Co C D, preserves completeness. 

If all stuck states, see below Definition 28, with respect to a fixed ĝ are visited 
before increasing 3 then this provides a simple dynamic fairness strategy. 

When unit reduction or any other form of supported rewriting is applied to 
clauses smaller than the current Ø, it can be applied independently from the 
current trail. If, however, unit reduction is applied to clauses larger than the 
current 8 then the calculus must do a restart to its initial state, in particular 
the trail must be emptied, as for otherwise rewriting may result generating a 
conflict that did not exist with respect to the current trail before the rewriting. 
This is analogous to a restart in CDCL once a propositional unit clause is derived 
and used for simplification. More formally, we add the following new Restart rule 
to the calculus to reset the trail to its initial state after a unit reduction. 


Restart 
(T; N; U; p; k; T) >scLægQ) (6 N; U; 6; 0; T) 

Next we show refutation completeness of SCL(EQ). To achieve this we first 
give a definition of a stuck state. Then we show that stuck states only occur if 
all ground literals L <7 ( are -defined in I and not during conflict resolution. 
Finally we show that conflict resolution will always result in an application of 
Backtrack. This allows us to show termination (without application of Grow) 
and refutational completeness. 


Definition 28 (Stuck State). A state ([;N;U;{8;k;D) is called stuck if 
D# L and none of the rules of the calculus, except for Grow, is applicable. 


Lemma 29 (Form of Stuck States). Jf a regular run (without rule Grow) 
ends in a stuck state (I; N; U; 8;k; D), then D = T and all ground literals 
Lo <r b, where LV C € (NUU) are G-defined in T. 
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Lemma 30. Suppose a sound state (T; N; U; 6; k; D) resulting from a regular 
run where D ¢ {T, L}. If Backtrack is not applicable then any set of applications 
of Explore-Refutation, Skip, Factorize, Equality-Resolution will finally result 
in a sound state (I’; N;U;3;k; D’), where D' <r» D. Then Backtrack will be 
finally applicable. 


Corollary 31 (Satisfiable Clause Sets). Let N be a satisfiable clause set. 
Then any regular run without rule Grow will end in a stuck state, for any ß. 


Thus a stuck state can be seen as an indication for a satisfiable clause set. 
Of course, it remains to be investigated whether the clause set is actually satisfi- 
able. Superposition is one of the strongest approaches to detect satisfiability and 
constitutes a decision procedure for many decidable first-order fragments [4, 19]. 
Now given a stuck state and some specific ordering such as KBO, LPO, or some 
polynomial ordering [17], it is decidable whether the ordering can be instantiated 
from a stuck state such that I coincides with the superposition model operator 
on the ground terms smaller than ø. In this case it can be effectively checked 
whether the clauses derived so far are actually saturated by the superposition 
calculus with respect to this specific ordering. In this sense, SCL(EQ) has the 
same power to decide satisfiability of first-order clause sets than superposition. 


Definition 32. A regular run terminates in a state (T; N; U; 8; k; D) if D=T 
and no rule is applicable, or D = L. 


Lemma 33. Let N be a set of clauses and 8 be a ground term. Then any regular 
run that never uses Grow terminates. 


Lemma 34. Ifa regular run reaches the state (I; N; U; B; k; L) then N is unsat- 
isfiable. 


Theorem 35 (Refutational Completeness). Let N be an unsatisfiable 
clause set, and <r a desired term ordering. For any ground term 6 where 
gndzro(N) is unsatisfiable, any regular SCL(EQ) run without rule Grow will 
terminate by deriving L. 


5 Discussion 


We presented SCL(EQ), a new sound and complete calculus for reasoning in first- 
order logic with equality. We will now discuss some of its aspects and present 
ideas for future work beyond the scope of this paper. 

The trail induced ordering, Definition 9, is the result of letting the calculus 
follow the logical structure of the clause set on the literal level and at the same 
time supporting rewriting at the term level. It can already be seen by examples on 
ground clauses over (in)equations over constants that this combination requires 
a layered approach as suggested by Definition 9, see [24]. 

In case the calculus runs into a stuck state, i.e., the current trail is a model 
for the set of considered ground instances, then the trail information can be 
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effectively used for a guided continuation. For example, in order to use the trail 
to certify a model, the trail literals can be used to guide the design of a lifted 
ordering for the clauses with variables such that propagated trail literals are 
maximal in respective clauses. Then it could be checked by superposition, if the 
current clause is saturated by such an ordering. If this is not the case, then 
there must be a superposition inference larger than the current 3, thus giving 
a hint on how to extend 8. Another possibility is to try to extend the finite 
set of ground terms considered in a stuck state to the infinite set of all ground 
terms by building extended equivalence classes following patterns that ensure 
decidability of clause testing, similar to the ideas in [14]. If this fails, then again 
this information can be used to find an appropriate extension term ( for rule 
Grow. 

In contrast to superposition, SCL(EQ) does also inferences below variable 
level. Inferences in SCL(EQ) are guided by a false clause with respect to a 
partial model assumption represented by the trail. Due to this guidance and the 
different style of reasoning this does not result in an explosion in the number of 
possibly inferred clauses but also rather in the derivation of more general clauses, 
see [24]. 

Currently, the reasoning with solely positive equations is done on and with 
respect to the trail. It is well-known that also inferences from this type of rea- 
soning can be used to speed up the overall reasoning process. The SCL(EQ) 
calculus already provides all information for such a type of reasoning, because it 
computes the justification clauses for trail reasoning via rewriting inferences. By 
an assessment of the quality of these clauses, e.g., their reduction potential with 
respect to trail literals, they could also be added, independently from resolving 
a conflict. 

The trail reasoning is currently defined with respect to rewriting. It could 
also be performed by congruence closure [26]. 

Towards an implementation, the aspect of how to find interesting ground 
decision or propagation literals for the trail can be treated similar to CDCL [11, 
21,25,29]. A simple heuristic may be used from the start, like counting the 
number of instance relationships of some ground literal with respect to the clause 
set, but later on a bonus system can focus the search towards the structure of the 
clause sets. Ground literals involved in a conflict or the process of learning a new 
clause get a bonus or preference. The regular strategy requires the propagation of 
all ground unit clauses smaller than 8. For an implementation a propagation of 
the (explicit and implicit) unit clauses with variables to the trail will be a better 
choice. This complicates the implementation of refutation proofs and rewriting 
(congruence closure), but because every reasoning is layered by a ground term 
b this can still be efficiently done. 
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Abstract. We propose generalizations of reduction pairs, well-establis- 
hed techniques for proving termination of term rewriting, in order to 
prove unsatisfiability of reachability (infeasibility) in plain and condi- 
tional term rewriting. We adapt the weighted path order, a merger of the 
Knuth—Bendix order and the lexicographic path order, into the proposed 
framework. The proposed approach is implemented in the termination 
prover Nal T, and the strength of our approach is demonstrated through 
examples and experiments. 


1 Introduction 


In the research area of term rewriting, among the most well-studied topics are 
termination, confluence, and reachability analyses. 

In termination analysis, a crucial task used to be to design reduction orders, 
well-founded orderings over terms that are closed under contexts and sub- 
stitutions. Well-known examples of such orderings include the Knuth—Bendix 
ordering [14], polynomial interpretations [18], multiset/lexicographic path order- 
ing [4,13], and matrix interpretations [5]. The dependency pair framework gen- 
eralized reduction orders into reduction pairs [2,9,12], and there are a number 
of implementations that automatically find reduction pairs, e.g., AProVE [7], 
Tro [16], MU-TERM [11], NaTT [35], competing in the International Termina- 
tion Competition [8]. 

Traditional reachability analysis (cf. [6]) has been concerned with the pos- 
sibility of rewriting a given source term s to a target t, where variables in the 
terms are treated as constants. There is an increasing need for solving a more 
general question: is it possible to instantiate variables so that the instance of s 
rewrites to the instance of t? Let us illustrate the problem with an elementary 
example. 


Example 1. Consider the following TRS encoding addition of natural numbers: 
Radd := { add(0, y) > y, add(s(x), y) —> s(add(x, y)) } 

The reachability constraint add(s(x), y) > y represents the possibility of rewrit- 

ing from add(s(x), y) to y, where variables x and y can be arbitrary terms. 
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This (un)satisfiability problem of reachability, also called (in)feasibility, plays 
important roles in termination [24] and confluence analyses of (conditional) 
rewriting [21]. A tool competition dedicated for this problem has been founded 
as the infeasibility (INF) category in the International Confluence Competition 
(CoCo) since 2019 [25]. 

In this paper, we propose a new method for proving unsatisfiability of reach- 
ability, using the term ordering techniques developed for termination analysis. 
Specifically, in Sect.3, we first generalize reduction pairs to rewrite pairs, and 
show that they can be used for proving unsatisfiability of reachability. We further 
generalize the notion to co-rewrite pairs, yielding a sound and complete method. 
The power of the proposed method is demonstrated by importing (relaxed) 
semantic term orderings from termination analysis. 

In order to import also syntactic term orderings, in Sect.4 we identify a 
condition when the weighted path order (WPO) [36] forms a rewrite pair. Since 
KBO and LPO are instances of WPO, we see that these orderings can also be 
used in our method. In Sect.5 we also present how to derive co-rewrite pairs 
from WPO. 

In Sect. 6, we adapt the approach into conditional rewriting. Section 7 reports 
on the implementation and experiments conducted on examples in the paper and 
the benchmark set of CoCo 2021. 


Related Work Our rewrite pairs are essentially Aoto’s discrimination pairs [1] 
which are closed under substitutions. On way of disproving confluence, Aoto 
introduced discrimination pairs and used them in proving non-joinability. The 
joinability of terms s and t is expressed as du. s >h u <—% t, while the current 
paper is concerned with 30. s@ >k t0. As substitutions are not considered, 
discrimination pairs do not need closure under substitutions, and Aoto’s insights 
are mainly for dealing with the reverse rewriting —p. 

Lucas and Gutiérrez [19] proposed reducing infeasibility to the model finding 
of first-order logic. Our formulations especially in Sect.6 are similar to theirs. 
A crucial difference is that, while they encode the closure properties and order 
properties into logical formulas and delegate these tasks to the background the- 
ory solvers, we ensure these properties by means of reduction pairs, for which 
well-established techniques exist in the literature. 

Sternagel and Yamada [30] proposed a framework for analyzing reachability 
by combining basic logical manipulations, and Gutiérrez and Lucas [10] proposed 
another framework, similar to the dependency pair framework. The present work 
focuses on atomic analysis techniques, and is orthogonal to these efforts of com- 
bining techniques. 


2 Preliminaries 


We assume familiarity with term rewriting, cf. [3] or [32]. For a binary relation 
denoted by a symbol like 3, we denote its dual relation by C and the negated 
relation by 7. Relation composition is denoted by o. 


250 A. Yamada 


Throughout the paper we fix a set V of variable symbols. A signature is 
a set F of function symbols, where each f € F is associated with its arity, 
the number of arguments. The set of terms built from F and VY is denoted by 
T(F,V), where a term is either in V or of form f(s1,...,5n) where f € F is 
n-ary and s1,...,Sn E T(F, V). Given a term s € T(F,V) and a substitution 
0: V —T(F, V), sê denotes the term obtained from s by replacing every variable 
x by O(a). A contezt is a term C € T(F,VU {O}) where a special variable 
occurs exactly once. Given s € T(F, V), we denote by C[s] the term obtained 
by replacing O by s in C. 

A relation I over terms is closed under substitutions (resp. contexts) iff s ot 
implies s0 3 t0 for any substitution 0 (resp. C[s] 3 Cft] for any context C). 
Relations over terms that are closed under contexts and substitutions are called 
rewrite relations. Rewrite relations which are also preorders are called rewrite 
preorders, and those which are strict orders are rewrite orders. Well-founded 
rewrite orders are called reduction orders. 

A term rewrite system (TRS) R is a (usually finite) relation over terms, where 
each (l, r) € R is called a rewrite rule and written 1 — r. We do not require 
the usual assumption that | ¢ V and variables occurring in r must occur in l. 
The rewrite step —>pr induced by TRS FR is the least rewrite relation containing 
R. Its reflexive transitive closure is denoted by —%, which is the least rewrite 
preorder containing R. 

A reachability atom is a pair of terms s and t, written s —> t. We say that 
s —> tis R-satisfiable iff s0 >} tO for some 0, and R-unsatisfiable otherwise. 


3 Term Orderings for Non-reachability 


Reduction pairs constitute the core ingredient in proving termination with 
dependency pairs. Just as rewrite orders generalize reduction orders, we first 
introduce the notion of “rewrite pairs” by removing the well-foundedness 
assumption of reduction pairs. 


Definition 1 (rewrite pair). We call a pair (2,4) of relations an order pair 
if I is a preorder, I is irreflexive, I C I, and 30o 03I CQO. A rewrite pair 
is an order pair (3,3) over terms such that both J and I are closed under 


substitutions and I is closed under contexts. It is called a reduction pair if 
moreover — is well-founded. 


Standard definitions of reduction pairs put less order-like assumptions than 
the above definition, but the above (more natural) assumptions do not lose the 
generality of previous definitions [34]. Due to these assumptions, our rewrite pair 
satisfies the assumption of discrimination pairs [1]. 

The following statement is our first observation: a rewrite pair can prove 
non-reachability. 


Theorem 1. If (3,3) is a rewrite pair, R C I and s C t, then s —> t is 
R-unsatisfiable. 
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A similar observation has been made [20, Theorem 11], where well- 
foundedness is assumed instead of irreflexivity. Note that irreflexivity is essential: 
if s C s for some s, then we have s C s but s —> s is R-satisfiable. 

The proof of Theorem 1 will be postponed until more general Theorem 2 will 
be obtained. Instead, we start with utilizing Theorem 1 by generalizing a classical 
way of defining reduction pairs: the semantic approach [23]. 


Definition 2 (model). An F-algebra A = (A,|-]) specifies a set A called the 
carrier and an interpretation |f] : A” — A to each n-ary f € F. The evaluation 
of a term s under assignment a: V > A is defined as usual and denoted by |s]a. 
A related/preordered F-algebra (A, 3) = (A, [], 3) consists of an F-algebra 
and a relation/preorder I on A. Givena: V — A, we write |s I tla to mean 
[s]a 3 [tla. We write A s 3 t if [s I tla holds for everya:V— A. We 
say (A, I) is a (relational) model of a TRS R if AE lar for every l > 
r ER. We say (A, I) is monotone if a; I aj, implies [f](a1,...,@i,.--,@n) I 
[f](a1,.--,@4,...,@n) for arbitrary a1,...,an,a; E€ A and n-ary f E€ F. 


The notion of relational models is due to van Oostrom [28]. In this paper, 
we simply call them models. Models in terms of equational theory are models 
(A, =) in the above definition, where monotonicity is inherent. Quasi-models of 
Zantema [37] are preordered (or partially ordered) monotone models. Theorem 1 
can be reformulated in the semantic manner as follows: 


Corollary 1. If(>,>) is an order pair, (A, >) is a monotone model of R, and 
AH s< t, then s —> t is R-unsatisfiable. 


Note that Corollary 1 does not demand well-foundedness on >. In particular, 
one can employ models over negative numbers (or equivalently, positive numbers 
with the order pair (<,<)). 


Example 2. Consider again the TRS Raggy of Example 1. The monotone ordered 
F-algebra (Z<o, [:], >) defined by 


[add] (z, y) = £ +y [s](w) =z- 1 [0] = 0 
is a model of Rada: Whenever x,y € Z<o, we have 
[add] ([0], y) = y [add] ([s] (x), y) = z + y — 1 = [s] ([add] (x, y)) 


Now we can conclude that the reachability constraint add(s(x), y) > y is Rada- 
unsatisfiable by (Z<o, |]) = add(s(x), y) < y: Whenever x,y € Z<o, we have 


ladd] ([s] (x), y) =£ +y=1<y 


Observe that in Theorem 1, 3 occurs only in the dual form C. Hence we 
now directly analyze the condition which J and C should satisfy to prove non- 
reachability, and this gives a sound and complete method. 
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Definition 3 (co-rewrite pair). We call a pair (2,C) of relations over terms 
a co-rewrite pair, if J is a rewrite preorder, C is closed under substitutions, and 


Inc =O. 


Theorem 2. s —> t is R-unsatisfiable if and only if there exists a co-rewrite 
pair (3,C) such that RC 3 andsct. 


Proof. For the “if” direction, suppose on the contrary that s0 >} t0 for some 0. 
Since J is a rewrite preorder containing R and >} is the least of such, we must 
have s0 J t0. On the other hand, since s C t and C is closed under substitutions, 


we have s0 C t0. This is not possible since J UC = 9. 

For the “only if” direction, take >R as 3 and define C by s C t iff s > t is 
R-unsatisfiable. Then clearly C is closed under substitutions, >} NC = Í, and 
RE >k- 


Theorem 2 can be more concisely reformulated in the model-oriented manner, 
as the greatest choice of C can be specified: s C t iff AF s # t. 


Corollary 2. s > t is R-unsatisfiable if and only if there exists a monotone 
preordered model (A, >) of R such that AF s # t. 


Corollary 2 is useful when models over non-totally ordered carriers are con- 
sidered. There are important methods (for termination) that crucially rely on 
such carriers: the matrix interpretations [5], or more generally the tuple inter- 
pretations [15,34]. 


Example 3. Consider the following TRS, where the first rule is from [5]: 
Rmat = { (f(x) > Flalf(x))), f(x) > x } 


The preordered {f, g}-algebra (N?, [-], >) defined by 


x xety+l x g+ 
f = = 
nE GET) 
is a model of Rmat, where > is extended pointwise over N?. Indeed, the first rule 
is oriented as the following calculation demonstrates: 


TCO = yas?) 5 O72") =m (w (19 ))) 


and the second rule can be easily checked. Now we prove that x — g(x) is Rmat- 
unsatisfiable by Corollary 2. Indeed, (N?, [-]) H x # g(x) is shown as follows: 


a 


for any z,y € N. Note also that Theorem 1 is not applicable, since (N?,[-]) /- 
x < g(x) due to the second coordinate. 
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We conclude the section by proving Theorem 1 via Theorem 2. 


Proof (of Theorem1). We show that (3, C) form a co-rewrite pair when (3, 3) 
is a rewrite pair. It suffices to show that JNC = Ø. To this end, suppose on the 
contrary that s I t 3 s. By compatibility, we have s 3 s, which contradicts the 
irreflexivity of 3. 


4 Weighted Path Order for Non-reachability 


The previous section was concerned with the semantic approach towards obtain- 
ing (co-)rewrite pairs. In this section we focus on the syntactic approach. We 
choose the weighted path order (WPO), which subsumes both the lexicographic 
path order (LPO) and the Knuth-Bendix order (KBO), so the result of this 
section applies to these more well-known methods. The multiset path order [4] 
can also be subsumed [29], but we omit this extension to keep the presentation 
simple. WPO is induced by three ingredients: an F-algebra; a precedence order- 
ing over function symbols; and a (partial) status, which controls the recursive 
behavior of the ordering. 


Definition 4 (partial status). A partial status m specifies for each n-ary f € 
F alist r(f) € {1,...,n}*, also seen as a set, of its argument positions. We say 
m is total if 1,...,n E€ a(f) whenever f is n-ary. When m(f) = [i1,..., im], we 
denote [Si ;.--, Sim] by Tf(S1,---,8n)- 


For instance, the empty status 7(f) = [] allows WPO to subsume weakly 
monotone interpretations [36, Section 4.1]. We allow positions to be duplicated, 
following [33]. 


Definition 5 (WPO [36]). Let m be a partial status, A an F-algebra, and 
(>,>) and (=,>) be pairs of relations on A and F, respectively. The weighted 
path order WPO(z,A,>,>,7,>), or WPO(A) or even WPO for short, is the 


pair (Iwpeo, Iwrpo) of relations over terms defined as follows: s Owpo t iff 


1. AEs>tor 
2. A= s >t and 


(a) s = f(s1,..-;, Sn), Si Jwpo t for some i € n(f); 

(b) s= f(s1,-.-,5n), t= g(t1,.-., tm); 8 Iwrpo tj for every j € m(g) and 
i. f >g, or 
ii. f XZ g and Te Biyori Sn) Deg Talts.. tm): 


The relation Iwpo is defined similarly, but with I à instead of Io in (2b-ii) 


and the following subcase is added in case 2: 


(c) s=teV. 


Here (3'3!) denotes the lexicographic extension of a pair P = (Ip, ap) of 
relations, defined by: [s1,..., Sn] aS [t1,...,tm] iff 
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-~m=0 andn >, 0, or 
- m,n > 0 and sı > tı or both sı Ip ty and [s2,..., Sn] m [to,...,tm]. 


LPO is WPO induced by a total status m and a trivial F-algebra as A, 
and is written LPO. Allowing partial statuses corresponds to applying argument 
filters [2,17] (except for collapsing ones). KBO is a special case of WPO where 
m is total and A is induced by an admissible weight function. 

For termination analysis, a precondition for WPO to be a reduction pair is 
crucial. In this work, we only need it to be a rewrite pair; that is, well-foundedness 
is not necessary. Thus, for instance, it is possible to have « Owpo f(x) by 
[f] (x) = x — 1. This explains why s € V is permitted in case 1, which might look 
useless to those who are already familiar with termination analysis. 

We formulate the main claim of this section as follows. 


Definition 6 (z-simplicity). We say a related F-algebra (A,[:],>) is m- 
simple! for a partial status n iff |f] (a1,...,an) > ai for arbitrary n-ary f € F, 
@1,---,An E A, andi E€ x(f). 


Proposition 1. If (>,>) and (=,>) are order pairs on A and F, and (A, >) 
is monotone and 7-simple, then (Iwpo, Iwpo) is a rewrite pair. 


Under these conditions, it is known that Jwpo is closed under contexts and 
Iwrpo is compatible with Jwpo [36, Lemmas 7, 10, 13]. Later in this section we 
prove other properties necessary for Proposition 1, for which the claims in [36] 
must be generalized for the purpose of this paper. 

The benefit of having syntax-aware methods can be easily observed by recall- 
ing why we have them in termination analysis. 


Example 4 ([13]). Consider the TRS Ra consisting of the following rules: 
A(0, y) > s(y) A(s(x),0) > A(z,s(0)) A(s(x), s(y)) > A(z, A(s(), y)) 


and suppose that a monotone {A, s, 0}-algebra (N, [], >) is a model of Ra. Then, 
denoting the Ackermann function by A, we have 


[Al ([s}” (0), [s}"(0)) > [s140 (0) (1) 


Now consider proving the obvious fact that x — s(x) is Ra-unsatisfiable. This 
requires (N, []}) = x < s(x), and then [s]"(0) > n by an inductive argument. 
This is not possible if [A] is primitive recursive (e.g., a polynomial), since (1) 
with [s]4(™") (0) > A(m,n) contradicts the well-known fact that the Ackermann 
function has no primitive-recursive bound. 

On the other hand, LPO with A > s satisfies Ra C Ipo (C po) and 
x Cipo S(x). Thus Theorem 1 with (3,3) = (Lpo, Ilpo) proves that z — s(x) 
is Ra-unsatisfiable, thanks to Proposition 1 and Theorem 1. 


1 Such a property would be called inflationary in the mathematics literature. In the 
term rewriting, the word simple has been used (see, e.g., [32]) in accordance with 
simplification orders. 
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Example 5. Consider the TRS consisting of the following rules: 
Rroo = { f(g(2)) > e(f(f(x))), gl) > x } 
WPO (or KBO) induced by A = (N, [:]) and precedence (=, >) such that 


[f] (2) = x [el(z)=a+1 frg 


satisfies Rkbo C Jweo. Thus, for instance g(x) > g(f(x)) is Reyo-unsatisfiable 

by Theorem 1. On the other hand, let (A, [], >) with A C Z be a model of Repo. 

Using the idea of [38, Proposition 11], one can show [f](a) < x. Hence, Corollary 2 

with models over a subset of integers cannot handle the problem. LPO orients 

the first rule from right to left and hence cannot handle the problem either. 
The power of WPO can also be easily verified, by considering 


Rwpo = Rkbo U { f(h(x)) = h(h(f(x))), f(x) See } 


By extending the above WPO with [h] (x) = x and f > h, which does not fall 
into the class of KBO anymore,? we can prove, e.g., that f(x) — f(h(x)) is 
R-unsatisfiable. None of the above mentioned methods can handle this problem. 


The rest of this section is dedicated for proving Proposition 1. Similar results 
are present in [36], but they make implicit assumptions such as that > and = are 
preorders. In this paper we need more essential assumptions as we will consider 
non-transitive relations in the next section. 

First we reprove the reflexivity of Jwpo. The proof also serves as a basis for 
the more complicated irreflexivity proof. 


Lemma 1. If both > and = are reflexive and (A,>) is m-simple, then 


1. i€ n(f) implies f(si,...,5n) Iwrpo $i, and 
2. s wpro s, i.e., Jwpo is reflexive. 


Proof. As s Iwpo s is trivial when s € V, we assume s = f(s1,...,Sn) and 
prove the two claims by induction on the structure of s. For the first claim, by 
m-simplicity, for any a we have [s]a = [f]([siJa,...,[5n]a@) > [s;]a, and hence 


A = s > si. By the second claim of induction hypothesis we have s; Jwpo Si, 
and thus s Āwpo s; follows by (2a) of Definition5. Next we show s Jwpo s 
holds by (2b-ii). Indeed, A = s > s follows from the reflexivity of >; s Owpo 
si for every i € a(f) as shown above; f = f as X is reflexive; and finally, 
™ f(S1,--+5 $n) Io Teli,- Sn) is due to induction hypothesis and the fact 


that lexicographic extension preserves reflexivity. 


Using reflexivity, we can show that both Jwpo and Owpo are closed under 
substitutions. This result will be reused in Sect. 5, where it will be essential that 
neither > nor > need be transitive. 


2 When [h] is the identity. KBO requires h = f for any f. 
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Lemma 2. If both > and = are reflexive and (A,>) is m-simple, then both 
Iwpo and Iwpo are closed under substitutions. 


Proof. We prove by induction on s and t that s Jwpo t implies s@ Jwpo t@ and 
that s Iwrpo t implies s@ Owpo t0. We prove the first claim by case analysis on 
how s Jwpo t is derived. The other claim is analogous, without case (2c) below. 


1. A = s >t: Then we have A = s0 > tô and thus s@ Iwrpo t6 by case 1. 
2. A| s > t: Then we have A } s0 > t0. There are the following subcases. 


(a) s = f(si1,...,5n) and s; Jwpo t for some i € z(f): In this case, we 
know s;9 Jwpo t0 by induction hypothesis on s. Thus (2a) concludes 
sO Iwpo tô. 


(b) s = f(s1,-.-, Sn), t = g(t1,-.- tm), and s Iwrpo t; for every j € 1(g): By 
induction hypothesis on t, we have s0 Iwpo tj. So the precondition of 
(2b) for sð Iwpo tô is satisfied. There are the following subcases: 

i. f > g: Then (2b-i) concludes. 
ii, f = g and mye(s1,...,8n) IP o Tglti,-.- tm): Then by induction 
hypothesis we have T¢(510,..., 89) I! o Ta(t10,..., tm), and thus 
(2b-ii) concludes. 
(c) s =t € V: Then we have s0 Iwpo tO by Lemma 1. 


Irreflexivity of Iwpo is less obvious to have. In fact, [36] uses well-foundedness 
to claim it. Here we identify more essential conditions. 


Lemma 3. If (>,>) is an order pair on A, and > is irreflexive on F, and 
(A, >) is 1-simple, then Oweo is irreflexive. 


Proof. We show s Zwpo s for every s by induction on the structure of s. This 
is clear if s € V, so consider s = f(s1,...,5n). Since > is irreflexive, we have 
A Æ s > s, and thus s Iwrpo s cannot be due to case 1 of Definition 5. As > is 
irreflexive on F, f % f and thus (2b-i) is not possible, either. Thanks to induction 
hypothesis and the fact that lexicographic extension preserves irreflexivity, we 
have Wy (Si40s04S,) Tee 1 (S1,--+,5n), and thus (2b-ii) is not possible either. 

The remaining (2a) is more involving. To show s; Awpo f(s1,-.--,5n) for any 
i € m(f), we prove the following more general claim: s’ <} s implies s’ Awpo s, 
where <, denotes the least relation such that si <r f(s1,...,8n) if i E€ m(f). 
This claim is proved by induction on s’. Due to the simplicity assumption, we 
have A s > s’ for every s’ <, s, and this generalizes for every s <f s by 
easy induction and the transitivity of >. Thus we cannot have A H s’ > s, since 
AE s > s > s contradicts the assumption that (>,>) is an order pair. This 
tells us that s Jwpo s cannot be due to case 1. Case (2a) is not applicable 
thanks to (inner) induction hypothesis on s’. Case (2b) is not possible either, 
since s’ Ziwpo s’ thanks to (outer) induction hypothesis on s. This concludes 
s! Dweo s for any s’ <7} s, and in particular s; Zwpo s for any i € x(f), 
refuting the last possibility for s Jwpo s to hold. 
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5 Co-WPO 


The preceding section demonstrated how to use WPO as a rewrite pair in The- 
orem 1. In this section we show how to use WPO in combination with Theo- 
rem 2, that is, when J = Jwpo, what C should be. We show that Cypo, where 
WPO := WPO(z,.<A, £, É, 4, Z), serves the purpose. 


Proposition 2. If (>,>) and (=,>) are order pairs on A and F, (A, >) is 
m-simple and monotone, then (Iwpo, Cipo) is a co-rewrite pair. 


When (A, >) is not total, Example3 also demonstrates that using Proposi- 
tion 2 with Theorem 2 is more powerful than using Proposition 1 in combination 
with Theorem 1, by taking z(f) = |] for every f. At the time of writing, how- 
ever, it is unclear to the author if the difference still exists when (A, >) is totally 
ordered but (F, =) is not. Nevertheless we will clearly see the merit of Proposi- 
tion 2 under the setting of conditional rewriting in the next section. 

The remainder of this section proves Proposition 2. Unfortunately, WPO does 
not satisfy many important properties of WPO, mostly due to the fact that (£, £) 
is not even an order pair. Nevertheless, Lemma 2 is applicable to Typo and gives 
the following fact: 


Lemma 4. If (>,>) is an order pair on A, (A, >) is w-simple, and > is irreflex- 
ive, then Typo is closed under substitutions. 


Proof. We apply Lemma 2 to WPO. To this end, we need to prove the following: 


— (A, £) is 7-simple: Suppose on the contrary one had [f](a1,...,@n) < a; with 
i € (f). Due to the simplicity assumption, we have [f](a1,...,@n) > ai. By 
compatibility we must have a; < a;, contradicting irreflexivity. 

— £ and & are reflexive: This follows from the irreflexivity of < and <. 


The remaining task is to show that Jwpo N Cy = Ø. Due to the mutual 
inductive definition of WPO, we need to simultaneously prove the property for 
the other combination: Typ5M Cwro = Í. 


Definition 7. We say that two pairs P = (Ip,op) and Q = (Qa, Q) of 


relations are co-compatible iff Ip N Cgo = Ip N Eg = Í. 


The next claim is a justification for the word “compatible” in Definition 7. 
Here the compatibility assumption of order pairs is crucial. 


Proposition 3. An order pair (3,3) is co-compatible with itself. 


Proof. Suppose on the contrary that a J b and b 3 a. Then we have a 3 a by 
compatibility, contradicting the irreflexivity of 3. 


Lemma 5. If P = (Ap,op) and Q = (2g, 4a) are co-compatible pairs of 


relations, then (a's, 3) and (ag, 3) are co-compatible. 
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Proof. Let us assume that both 


Pei xed a fti; siesty] 
[sien] c ti, ++, ten] 


(2) 
(3) 


hold and derive a contradiction. The other part I! N cx is analogous. We 


proceed by induction on the length of [s1,...,,]. If n = 0, then (2 
m = 0 but (3) demands m > 0. Hence we have n > 0, and then (3 
m > 0. If sı Op tı then by assumption we have sı Zg tı but (3 


) demands 
) demands 
) demands 


sı Eg ti (or sı Co ti). Hence (2) is due to sı Ip tı and [s9,...,8,] I% 
[to,...,tm]. By assumption we have sı ÉQ ti, so (3) is due to sı Cg tı and 
[S2,.-+,Sn| cg [t2,...,tm]. We derive a contradiction by induction hypothesis. 


We arrive at the main lemma for WPO. 


Lemma 6. If (>,>) and (=,>) are order pairs on A and F, and (A, >) is 


m-simple, then WPO and WPO are co-compatible. 


Proof. We show that neither s Iwrpo t^ s Cpo t nor s Iwrpo tA s Cipo t hold 
for any s and t, by induction on the structure of s and then t. Let us assume 
s Jwpo t and prove s Zypo t The other claim is analogous. We proceed by case 


analysis on the derivation of s Jwpo t. 


1. AH s >t: Then s Cipo t cannot hold as it demands AF s 4 t 


(or s ž t). 


2. A} s> t: Then A } s Žž t cannot happen and thus s Cipo t must be due 


to case 2 of Definition 5. There are the following subcases for s 3 


wPo t: 


(a) s = f(s1,...,5n), Si Jwpo t for some i € m(f): By induction hypothesis 


on s, we have s; ypo t, and thus s Cipo t can only be due 


to (2a). So 


t = g(ti,...,tm) and s Cpo t; for some j € m(g). Then s Zwpo t; by 


induction hypothesis on t. On the contrary we must have s 


(Iwrpo, Iwrpo) is an order pair. 


Sweo tj: By 
Lemma 1-1. we have s Iwrpo si Iwrpo t Sweo tj and hence s I 


wPo tj as 


(b) s = f(s1,..-,8n), t = g(ti,...,tm), and s Iwrpo t; for every j € m(g): 


By induction hypothesis on t, we have s Zypo t; for any j € 


m(g). Thus 


(c) s =t € V: Then clearly s Cpo t cannot hold. 


s Cpo t must be due to (2b). We proceed by further considering the 


following two possibilities. 


i. f > g: As neither f ¥ g nor fe Z g hold, s wPo | is not possible. 


i. f = g and Tẹlsi,.. Sn) DIS Tglti,..-, 85m): As f Z g does not 
hold, (2b-i) is not applicable to have s Cypo t- By Lemma5 and 
induction hypothesis, we have 7 /(51,..., Sn) Co Tg(ti,---,;tm) and 


thus (2b-ii) is also not applicable, either. 
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6 Conditional Rewriting 


Conditional term rewriting (cf. [27]) is an extension of term rewriting so that 
rewrite rules can be guarded by conditions. We are interested in the “oriented” 
variants, as they naturally correspond to functional programming concepts such 
as where clauses of Haskell or when clauses of OCaml. 

A conditional rewrite rule | + r < @ consists of terms l and r, and a list 
@ of pairs of terms. We may omit “<= []” and write sı —> t,...,5n > tn 
for [(s1,t1),.-., (S8n,tn)]. A conditional TRS (CTRS) R is a set of conditional 
rewrite rules. A CTRS œR yields the rewrite preorder — by the following deriva- 
tion rules [22]: 


sort toRu 


REFL ce TRANS 
SRS ETRU 
Si TR Si 
Mono 
or PBs ng Bhs oes Ses) “Pe TUBA Sees 9S) 
810 —> ty ve S 5 t 7 
R i R RULE if (l> r <& sı > ti,- Sn >n) ER 


l0 >r rd 


To approximate reachability with respect to CTRSs by means of (co-)rewrite 
pairs, one needs to be careful when dealing with conditions. 


Example 6. Consider the following CTRS: 
Reg = { f(z) > z, g(t) => y = f(x) > y } 


and a reachability atom g(x) > f(x). One might expect that a rewrite preorder 
JI such that 


f(a) le g(r) Dy if f(z)3y 


can over-approximate Re but this is unfortunately false. For instance, any 
LPO satisfies the above constraints: f(x) O_po x as LPO is a simplification order, 
and the second constraints also vacuously holds as the condition f(x) ILpo y is 
false. However, it is unsound to conclude that g(x) — f(x) is Ryg-unsatisfiable 
even if g(x) Cipo f(z): by setting f > g one can have g(x) Cipo f(x) and 
g(x) Erro f(z), but g(2) >r, f(2). 


A solution is to use co-rewrite pairs already for dealing with conditions. 


Proposition 4. If (2,C) is a co-rewrite pair, (l —> r <=) E€ R implies | Ir 
oru C v for some u >v E ġ, and sE t, then s —> t is R-unsatisfiable. 


Proof. We show that s >} t implies s J t. This is sufficient, since, then s0 >} 
t0 implies s0 3 t0, while s C t demands s0 C t0, which is not possible since 


3NC=9. The claim is proved by induction on the derivation of s SR t. 


— REFL: Since J is reflexive, we have s J s. 
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— TRANS: We have s >r t and t >k u as premises, and s I t and t 3 u by 
induction hypothesis. Since J is transitive we conclude s J u 


— Mono: We have s; >r s; as a premise and s; I s; by induction 
hypothesis. Since J is closed under contexts, we get f(51,..-,8i,---,;5n) J 
fliper SiS): 

— RULE: We have (l > r < sı > ti,...,Sn > tn) E€ R, and for every i € 
{1,... n} have s;0 >% tið as a premise and s;0 I t;0 by induction hypothe- 
sis. Since IN C = 9, we get s;0 7 t;0. Since C is closed under substitutions, 
we conclude s; [É t; for every i € {1,..., n}. By assumption, this entails / 3 r, 


and since J is closed under substitution, we conclude l0 I ré. 


Example 7. Consider the following singleton CTRS: 
Ra := {a —>b<b—a} 


Proposition 4 combined with LPO or WPO induced by a partial precedence such 
that a Z b and b % a proves that a > b is Rap-unsatisfiable: Clearly b Eīpo a 
and a Erp b by case (2b-i) of Definition 5. On the other hand, Proposition 4 
with the term ordering induced by a totally ordered algebra (A, >) cannot solve 
the problem, since A |= a # b implies A |+ b > a by totality, which then demands 
A | a > b to satisfy the assumption of Proposition 4. For the same reason, WPO 
induced by a totally ordered algebra and a total precedence cannot handle the 
problem either. 


Note that the condition of the rule in Ra» is unsatisfiable, and this is one of 
the two cases where Proposition 4 is effective. The other case is when a condition 
can be ignored. Proposition 4 is incomplete when conditions are essential, as in 
Example 6. For dealing with essential conditional rules, the variable binding in 
a rule should be taken into account. At this point, a model-oriented formulation 
(a la [19]) seems more suitable. 


Definition 8 (model of CTRS). We extend the notation |s 3 tla of Def- 
inition? to [dla for an arbitrary Boolean formula p with the single binary 
predicate I in the obvious manner. We say A = (A,|-]) validates ¢, written 
AE 4, iff [dla for everya:V— A. We say a related F-algebra (A, 3) is 
a model of a CTRS R iff AK lIrVs, Ati V-:-V sn Á tn for every 
(l>r <& si > ti Sn > tn) ER. 


Besides minor simplifications (e.g., we do not need two predicates as we 
are only concerned with reachability in many steps in this paper), the major 
difference with [19] is that here we do not encode the monotonicity or order 
axioms into logical formulas (using R of [19]). Instead, we impose these properties 
as meta-level assumptions over models. 


Theorem 3. Fora CTRS R, s —> t is R-unsatisfiable if and only if there exists 
a monotone preordered model (A, >) of R such that A = s # t. 


3 Here the formula s Zit is a shorthand for 4 s I t. 
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Proof. We start with the “if” direction. Let (A, >) be a monotone preordered 
model of R. As in Proposition 4, it suffices to show that s >} t implies A = 
s >t. The claim is proved by induction on the derivation of s >} t. 


— REEL: Since > is reflexive, we have A H s > s. 
— TRANS: We have s >r t and t >} u as premises, and A s >t and AF 
t > u by induction hypothesis. Since > is transitive we conclude A = s > u. 

— Mono: We have s; >r s; as a premise and A j s; > s' by induction 
hypothesis. Since (A,>) is monotone, we get A H f(s1,...,5i,---,;8n) > 
FlSij arg Sey eanySh): 

— RULE: We have (l > r < sı > t),...,5n > tn) E R, and for every i € 
{1,... n} have s;0 >h t;0 as a premise and A — s;:0 > t:0 by induction 
hypothesis. Since (A, >} is a model of R, and by the fact that validity is 
closed under substitutions, we have A = 10 > r0 V s10 ž tð V-V SnO ž tn. 
Together with the induction hypotheses we conclude A 10 > rð. 


Next consider the “only if” direction. We show that (J(F,V),—) is a model 
of R, that is, for every (l > r <= sı > ty,..., Sn > tn) E€ R, we show T(F,V) H 
l >h rV s1 Ph ti Vi V Sn h tn. This means 10 >} r0 for every 0: V > 
T(F,V) such that s10 >k t10,...,5n4 > tn, which is immediate by RULE. 
The fact that >% is a preorder and closed under contexts is also immediate. 
Finally, s > t being R-unsatisfiable means that s0 AR t0 for any 0 : V > 
T(F,V), that is, T(F, V) = s £} t. 


Putting implementation issues aside, it is trivial to use semantic (termina- 
tion) methods in Theorem 3. 


Example 8. Consider again the CTRS Rtg of Example 6. The monotone ordered 
{f, g}-algebra (N, [-], >) defined by 


[f] (x)= z [gl(x) =2+1 
is a model of Rfg, since for arbitrary x,y € N, we have 
[f] (x) > a lgl (z) =z+1 > yV [f](z)=ržy 


Then, with Theorem3 we can show that f(x) > g(x) is Rfg-unsatisfiable, as 
[f](z) = x Z x +1 = |g] (x) for every x € N. 


To use WPO(A) in combination with Theorem 3, we need to validate formulas 
with predicate Iwpo(a) in the term algebra 7 (F, V). We encode these formulas 
into formulas with predicates > and >, which are then interpreted in A. 


Definition 9 (formal WPO). Let (>,>) and (=,>) be pairs of relations 
over some set and over F, respectively, and let m be a partial status. We define 
wpo(7,>,>,2,>) or wpo for short to be the pair (upo, Jupo}, where for terms 


s,tET(F,V),s Spo t and s Typo t are Boolean formulas defined as follows: 


S Dupo t = Ss>tV(Ss>t^¢) 
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where is FALSE if s € V and is Vien fy $i Hupo t V Y if s = f(S1,---,8n), and 
yw is FALSE ift € V and is 


VAN sI —wpo tj A (f>gv (FZ IATE,- Sn) Ta Talli sogte))) 
jEn(g) 


if t = g(ti,...,tm). Formula s Tupo t is defined analogously, except that @ is 
TRUE if s=t EV, and a in formula w is replaced by 2° 


wpo =wpo’ 


We omit an easy proof that verifies that wpo encodes WPO: 


Lemma 7. s I wpo(a) t iff A H s yupo t. 


Note carefully that s Awpo(a) t is A E s Dupo t but not A = s Aypo t. Hence we 
ensure s Awpo(a) t by A | s Caps t, where wpo denotes wpo(m, £, £, A, Z)- 


Theorem 4. If R is a CTRS, (>,>) and (Z, >) are order pairs on A and F, 
(A, >) is t-simple and monotone, A |= l Jupo r V u1 Capo V1 V +++ V Un Capo Un 
for every (I r = uy > v1, ...,Un > Un) E R, and A = s Cap t, then s —> t 
is R-unsatisfiable. 


Proof. We apply Theorem 3. To this end, we first show that (T (F, V) , Jweov.a)) 
is a monotone preordered model of R. Monotonicity and preorderedness are due 
to Proposition 1. For being a model, let (l > r < uy —> V1,..., Un > Un) E R. 
Due to assumption and Lemma 7, we have l Jwpo(a) r V U1 CwPo(a) Y1 VeV 
Un CWPO(ĮA) Un- Due to Lemmas 2 and 4, we get 10 Jwpo(a) rO V u10 CWPO(A) 
V10 V-V unb CWPO(A) unb for every 0 : V > T (F, V). With Proposition 2 we 
conclude 7 (F, V) = l Iwpo(a) rV ui Awpo(a) v1 V: Vun Awpo(a) Un- Finally, 
we need 7 (F, V) H s Awpov) t, i.e., s0 Zwpota) tO for any 0 : V > T (F, V). As 
we assume s CWpo(a) t by Lemma 4 we have s0 CWPO(A) tð. By Proposition 2 
we conclude s9 Awpo(a) t0. 


7 Experiments 


The proposed methods are implemented in the termination prover NaTT [35], 
available at https://www.trs.cm.is.nagoya-u.ac.jp/NaTT/. 

Internally, NalT reduces the problem of finding an algebra A that make 
(A, =) a model of a TRS R (or Iwpo(a)E R) into a satisfiability modulo theory 
(SMT) problem, which is then solved by the backend SMT solver z3 [26]. The 
implementation of Theorem1 and Corollary 1 is a trivial adaptation from the 
termination methods. Cororllary 2 is also trivial for totally ordered carriers, since 
AE s # tis equivalent to A | s < t. Matrix/tuple interpretations are also easy, 
since A } (ay,...,an) Z (b1,..., bn) is equivalent to A j a1 < b1 V---Van < bn. 
Theorem 2 with WPO is obtained by parametrizing WPO. 
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Table 1. Experimental results. 


TRS CTRS 

Method Raid | Rmat | Ra | Rabo | Rupo | COPS(15) | Rap | Re | COPS(126) 

Sum 6 v |15 

Sumt 6 v |24 

Sum v 5 28 

Mat v 6 v |v | 25 (TO:88) 

LPO v 5 v 19 

WPO(Sum) sv iv l6 v Iv li 

WPO(Sumt) viv 6 vV |v | 25 (80:29) 

WPO(Sum- ) 6 y 15 

infChecker vV v 13 vV |v | 51+42 (TO:25) 

CO3 v 5 v 20 

NaTT 2.1 3 19 

NaTT 2.2 v v V |v v 6 (TO:4) vV |v |31 (TO:79) 

Theorem3 needs some tricks. In the unconditional case, finding a desired 

algebra A can be encoded into SMT over quantifier-free linear arithmetic for 


a large class of A [36]. For the conditional case, we need to find (3) parame- 
ters that validates (V) a disjunctive clause. Farkas’ lemma would reduce such a 
problem into quantifier-free SMT, but then the resulting problem is nonlinear. 
Experimentally, we observe that our backend z3 performs better on quantified 
linear arithmetic than quantifier-free nonlinear arithmetic, and hence we choose 
to leave the V quantifiers. 

We conducted experiments using the examples presented in the paper and 
the examples in the INF category of the standard benchmark set COPS. The 
execution environment is StarExec [31] with the same settings as CoCo 2019. 

Many COPS examples contain conjunctive reachability constraints of form 
S1 > t1A-+-A8n — tn. In this experiment we naively collapsed such a constraint 
into tp(s1,..., Sn) > tp(t1,...,t,) by introducing a fresh function symbol tp. 
Two benchmarks exceed the scope of oriented CTRSs, on which Nal T immedi- 
ately gives up. 

As co-rewrite pairs we tested algebras Sum, Sumt, Sum~, Mat, LPO, 
and WPO. The basic algebra Sum = (Z,|-]) is given by [f] (£z1,..., £n) = 
Co + Op Gi £i, where co E Z, ci,...,Cn € {0,1}. Similarly Sum* and Sum— 
are defined, where the ranges of co, which also determine the carrier, are N and 
Z<o, respectively. The algebra Mat represents the 2D matrix interpretations. 

Table 1 presents the results. For TRSs, we can observe that our proposed 
methods advance the state of the art, in the sense that they prove new examples 
that no tool previously participated in CoCo could handle. As there are only 15 
TRS examples in the INF category of COPS 2021, we could not derive interesting 
observations there. Taking CTRS examples into account, we see Sum is not as 
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good as Sumt or Sum”, while the carrier is bigger (Z versus N or Z<o). This 
phenomenon is explained as follows: For the latter two one knows variables are 
bounded by 0 (from below or above), and hence one can have Sumt H x >a 
or Sum” } a > z by [a] = 0. Neither is possible when the carrier is unbounded. 
This observation also suggests another choice of carriers that are bounded from 
below and above, which is left for future work. 

From the figures in CTRS examples, Sum~ performs the best among our 
methods. However, Mat and WPO(Sum*) solve more examples if TRS examples 
are counted. It does not seem appropriate yet to judge practical significance from 
these experiments. 

Finally, we implemented as the default strategy of Nal T 2.2 the sequential 
application of Sum~, LPO, WPO(Sumt), and Mat after the test NaTT already 
have implemented. There improvement over previous Nal T 2.1 should be clear, 
although the number of timeouts (indicated by “TO:”) is significant. 


8 Conclusion 


We proposed generalizations of termination techniques that can prove unsatisfia- 
bility of reachability, both for term rewriting and for conditional term rewriting. 
We implemented the approach in the termination prover NaI T, and experimen- 
tally evaluated the significance of the proposed approach. 

The implementation focused on evaluating the proposed methods separately. 
The only implemented way of combining their power is a naive one: apply the 
tests one by one while they fail. For future work, it will be interesting to incor- 
porate the proposed method into the existing frameworks [10,30]. 
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use, you will need to obtain permission directly from the copyright holder. 
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Abstract. Explanations for description logic (DL) entailments provide 
important support for the maintenance of large ontologies. The “justifica- 
tions” usually employed for this purpose in ontology editors pinpoint the 
parts of the ontology responsible for a given entailment. Proofs for entail- 
ments make the intermediate reasoning steps explicit, and thus explain 
how a consequence can actually be derived. We present an interactive 
system for exploring description logic proofs, called EVONNE, which visu- 
alizes proofs of consequences for ontologies written in expressive DLs. We 
describe the methods used for computing those proofs, together with a 
feature called signature-based proof condensation. Moreover, we evaluate 
the quality of generated proofs using real ontologies. 


1 Introduction 


Proofs generated by Automated Reasoning (AR) systems are sometimes pre- 
sented to humans in textual form to convince them of the correctness of a the- 
orem [9,11], but more often employed as certificates that can automatically be 
checked [20]. In contrast to the AR setting, where very long proofs may be 
needed to derive a deep mathematical theorem from very few axioms, DL-based 
ontologies are often very large, but proofs of a single consequence are usually of 
a more manageable size. For this reason, the standard method of explanation 
in description logic [8] has long been to compute so-called justifications, which 
point out a minimal set of source statements responsible for an entailment of 
interest. For example, the ontology editor Protégé! supports the computation of 
justifications since 2008 [12], which is very useful when working with large DL 
ontologies. Nevertheless, it is often not obvious why a given consequence actually 
follows from such a justification [13]. Recently, this explanation capability has 
been extended towards showing full proofs with intermediate reasoning steps, 
but this is restricted to ontologies written in the lightweight DLs supported by 
the ELK reasoner [15,16], and the graphical presentation of proofs is very basic. 


1 https: //protege.stanford.edu/. 
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In this paper, we present EVONNE as an interactive system, for exploring DL 
proofs for description logic entailments, using the methods for computing small 
proofs presented in [3,5]. Initial prototypes of EVONNE were presented in [6,10], 
but since then, many improvements were implemented. While EVONNE does 
more than just visualizing proofs, this paper focuses on the proof component 
of EVONNE: specifically, we give a brief overview of the interface for exploring 
proofs, describe the proof generation methods implemented in the back-end, 
and present an experimental evaluation of these proofs generation methods in 
terms of proof size and run time. The improved back-end uses Java libraries 
that extract proofs using various methods, such as from the ELK calculus, or 
forgetting-based proofs [3] using the forgetting tools LETHE [17] and FAME [21] in 
a black-box fashion. The new front-end is visually more appealing than the pro- 
totypes presented in [6, 10], and allows to inspect and explore proofs using various 
interaction techniques, such as zooming and panning, collapsing and expanding, 
text manipulation, and compactness adjustments. Additional features include 
the minimization of the generated proofs according to various measures and the 
possibility to select a known signature that is used to automatically hide parts 
of the proofs that are assumed to be obvious for users with certain previous 
knowledge. Our evaluation shows that proof sizes can be significantly reduced 
in this way, making the proofs more user-friendly. EVONNE can be tried and 
downloaded at https://imld.de/evonne. The version of EVONNE described here, 
as well as the data and scripts used in our experiments, can be found at [2]. 


2 Preliminaries 


We recall some relevant notions for DLs; for a detailed introduction, see [8]. DLs 
are decidable fragments of first-order logic (FOL) with a special, variable-free syn- 
tax, and that use only unary and binary predicates, called concept names and role 
names, respectively. These can be used to build complex concepts, which corre- 
spond to first-order formulas with one free variable, and axioms corresponding to 
first-order sentences. Which kinds of concepts and axioms can be built depends on 
the expressivity of the used DL. Here we mainly consider the light-weight DL ELH 
and the more expressive ACCH. We have the usual notion of FOL entailment 
O E a of an axiom a from a finite set of axioms O, called an ontology. of special 
interest are entailments of atomic CIs (concept inclusions) of the form A E B, 
where A and B are concept names. Following [3], we define proofs of O |= a as 
finite, acyclic, directed hypergraphs, where vertices v are labeled with axioms ¢(v) 
and hyperedges are of the form (S, d), with S a set of vertices and d a vertex such 
that {4(v) | v € S} = &(d); the leaves of a proof must be labeled by elements of O 
and the root by a. In this paper, all proofs are trees, i.e. no vertex can appear in 
the first component of multiple hyperedges (see Fig. 1). 


3 The Graphical User Interface 


The user interface of EVONNE is implemented as a web application. To support 
users in understanding large proofs, they are offered various layout options and 
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Fig. 1. Overview of EVONNE - a condensed proof in the bidirectional layout 


interaction components. The proof visualization is linked to a second view show- 
ing the context of the proof in a relevant subset of the ontology. In this ontology 
view, interactions between axioms are visualized, so that users can understand 
the context of axioms occurring in the proof. The user can also examine possible 
ways to eliminate unwanted entailments in the ontology view. The focus of this 
system description, however, is on the proof component: we describe how the 
proofs are generated and how users can interact with the proof visualization. 
For details on the ontology view, we refer the reader to the workshop paper [6], 
where we also describe how EVONNE supports ontology repair. 


Initialization. After starting EVONNE for the first time, users create a new 
project, for which they specify an ontology file. They can then select an entailed 
atomic CI to be explained. The user can choose between different proof meth- 
ods, and optionally select a signature of known terms (cf. Sect. 4), which can be 
generated using the term selection tool Protégé-TS [14]. 


Layout. Proofs are shown as graphs with two kinds of vertices: colored vertices 
for axioms, gray ones for inference steps. By default, proofs are shown using a 
tree layout. To take advantage of the width of the display when dealing with 
long axioms, it is possible to show proofs in a vertical layout, placing axioms 
linearly below each other, with inferences represented through edges on the side 
(without the inference vertices). It is possible to automatically re-order vertices 
to minimize the distance between conclusion and premises in each step. The third 
layout option is the bidirectional layout (see Fig. 1), a tree layout where, initially, 
the entire proof is collapsed into a magic vertex that links the conclusion directly 
to its justification, and from which individual inference steps can be pulled out 
and pushed back from both directions. 
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Exploration. In all views, each vertex is equipped with multiple functionalities 
for exploring a proof. For proofs generated with ELK, clicking on an inference 
vertex shows the inference rule used, and the particular inference with relevant 
sub-elements highlighted in different colors. Axiom vertices show different button 
(©, ©, O, ©) when hovered over. In the standard tree layout, users can hide sub- 
proofs under an axiom (®). They can also reveal the previous inference step (@) 
or the entire-sub-proof (®). In the vertical layout, the button (®) highlights and 
explains the inference of the current axiom. In the bidirectional layout, the arrow 
buttons are used for pulling inference steps out of the magic vertex, as well as 
pushing them back in. 


Presentation. A minimap allows users to keep track of the overall structure 
of the proof, thus enriching the zooming and panning functionality. Users can 
adjust width and height of proofs through the options side-bar. Long axiom 
labels can be shortened in two ways: either by setting a fixed size to all vertices, 
or by abbreviating names based on capital letters. Afterwards, it is possible to 
restore the original labels individually. 


4 Proof Generation 


To obtain the proofs that are shown to the user, we implemented different proof 
generation techniques, some of which were initially described in [3]. For ELH 
ontologies, proofs can be generated natively by the DL reasoner ELK [16]. These 
proofs use rules from the calculus described in [16]. We apply the Dijkstra-like 
algorithm introduced in [4,5] to compute a minimized proof from the ELK out- 
put. This minimization can be done w.r.t. different measures, such as the size, 
depth, or weighted sum (where each axiom is weighted by its size), as long as 
they are monotone and recursive [5]. For ontologies outside of the ELH frag- 
ment, we use the forgetting-based approach originally described in [3], for which 
we now implemented two alternative algorithms for computing more compact 
proofs (Sect. 4.1). Finally, independently of the proof generation method, one 
can specify a signature of known terms. This signature contains terminology 
that the user is familiar with, so that entailments using only those terms do not 
need to be explained. The condensation of proofs w.r.t. signatures is described 
in Sect. 4.2. 


4.1 Forgetting-Based Proofs 


In a forgetting-based proof, proof steps represent inferences on concept or role 
names using a forgetting operation. Given an ontology O and a predicate name z, 
the result O~* of forgetting x in O does not contain any occurrences of x, while 
still capturing all entailments of O that do not use x [18]. In a forgetting-based 
proof, an inference takes as premises a set P of axioms and has as conclusion 
some axiom a € P77” (where a particular forgetting operation is used to com- 
pute P~*). Intuitively, a is obtained from P by performing inferences on x. To 
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compute a forgetting-based proof, we have to forget the names occuring in the 
ontology one after the other, until only the names occurring in the statement 
to be proved are left. For the forgetting operation, the user can select between 
two implementations: LETHE [17] (using the method supporting ACCH) and 
FAME [21] (using the method supporting ACCOT). Since the space of possible 
inference steps is exponentially large, it is not feasible to minimize proofs after 
their computation, as we do for EL entailments, which is why we rely on heuris- 
tics and search algorithms to generate small proofs. Specifically, we implemented 
three methods for computing forgetting-based proofs: HEUR tries to find proofs 
fast, SYMB tries to minimize the number of predicates forgotten in a proof, with 
the aim of obtaining proofs of small depth, and SIZE tries to optimize the size of 
the proof. The heuristic method HEUR is described in [3], and its implementation 
has not been changed since then. The search methods SYMB and SIZE are new 
(details can be found in the extended version [1]). 


4.2 Signature-Based Proof Condensation 


When inspecting a proof over a real-world ontology, different parts of the proof 
will be more or less familiar to the user, depending on their knowledge about 
the involved concepts or their experience with similar inference steps in the past. 
For CIs between concepts for which a user has application knowledge, they may 
not need to see a proof, and consequently, sub-proofs for such axioms can be 
automatically hidden. We assume that the user’s knowledge is given in the form 
of a known signature X and that axioms that contain only symbols from X do 
not need to be explained. The effect can be seen in Fig. 1 through the “known” - 
inference on the left, where X contains SebaceousGland and Gland. The known 
signature is taken into consideration when minimizing the proofs, so that proofs 
are selected for which more of the known information can be used if convenient. 
This can be easily integrated into the Dijsktra approach described in [3], by 
initially assigning to each axiom covered by X a proof with a single vertex. 


5 Evaluation 


For EVONNE to be usable in practice, it is vital that proofs are computed effi- 
ciently and that they are not too large. An experimental evaluation of minimized 
proofs for EL and forgetting-based proofs obtained with FAME and LETHE is pro- 
vided in [3]. We here present an evaluation of additional aspects: 1) a comparison 
of the three methods for computing forgetting-based proofs, and 2) an evalua- 
tion on the impact of signature-based proof condensation. All experiments were 
performed on Debian Linux (Intel Core i5-4590, 3.30 GHz, 23 GB Java heap size). 


5.1 Minimal Forgetting-Based Proofs 


To evaluate forgetting-based proofs, we extracted ACCH “proof tasks” from the 
ontologies in the 2017 snapshot of BioPortal [19]. We restricted all ontologies 
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Fig. 2. Run times and proof sizes for different forgetting-based proof methods. Marker 
size indicates how often each pattern occurred in the BioPortal snapshot. Instances 
that timed out were assigned size 0. 


to ALCH and collected all entailed atomic CIs a, for each of which we computed 
the union U of all their justifications. We identified pairs (a,/) that were isomor- 
phic modulo renaming of predicates, and kept only those patterns (a,U/) that 
contained at least one axiom not expressible in ELH. This was successful in 373 
of the ontologies? and resulted in 138 distinct justification patterns (a,U), repre- 
senting 327 different entailments in the BioPortal snapshot. We then computed 
forgetting-based proofs for U = a with our three methods using LETHE, with a 
5-minute timeout. This was successful for 325/327 entailments for the heuristic 
method (HEUR), 317 for the symbol-minimizing method (SYMB), and 279 for the 
size-minimizing method (SIZE). In Fig.2 we compare the resulting proof sizes 
(left) and the run times (right), using HEUR as baseline (x-axis). HEUR is indeed 
faster in most cases, but SIZE reduces proof size by 5% on average compared to 
HEUR, which is not the case for SYMB. Regarding proof depth (not shown in the 
figure), SYMB did not outperform HEUR on average, while SIZE surprisingly yielded 
an average reduction of 4% compared to HEUR. Despite this good performance of 
SIZE for proof size and depth, for entailments that depend on many or complex 
axioms, computation times for both SYMB and SIZE become unacceptable, while 
proof generation with HEUR mostly stays in the area of seconds. 


5.2 Signature-Based Proof Condensation 


To evaluate how much hiding proof steps in a known signature decreases proof 
size in practice, we ran experiments on the large medical ontology SNOMED CT 
(International Edition, July 2020) that is mostly formulated in ELH.’ As signa- 
tures we used SNOMED CT Reference Sets,4 which are restricted vocabularies 


? The other ontologies could not be processed in this way within the memory limit. 
3 https: //www.snomed.org/. 
* https: //confluence.ihtsdotools.org/display /DOCRFSPG/2.3.+Reference+Set. 
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Fig. 3. Size of original and condensed proofs (left). Ratio of proof size depending on 
the signature coverage (right). 


for specific use cases. We extracted justifications similarly to the previous exper- 
iment, but did not rename predicates and considered only proof tasks that use 
at least 5 symbols from the signature, since otherwise no improvement can be 
expected by using the signatures. For each signature, we randomly selected 500 
out of 6.689.452 proof tasks (if at least 500 existed). This left the 4 reference 
sets General Practitioner/Family Practitioner (GPFP), Global Patient Set (GPS), 
International Patient Summary (IPS), and the one included in the SNOMED CT 
distribution (DEF). For each of the resulting 2.000 proof tasks, we used ELK [16] 
and our proof minimization approach to obtain (a) a proof of minimal size and 
(b) a proof of minimal size after hiding the selected signature. The distribution 
of proof sizes can be seen in Fig. 3. In 770/2.000 cases, a smaller proof was gener- 
ated when using the signature. In 91 of these cases, the size was even be reduced 
to 1, i.e. the target axiom used only the given signature and therefore nothing 
else needed to be shown. In the other 679 cases with reduced size, the average 
ratio of reduced size to original size was 0.68-0.93 (depending on the signature). 
One can see that this ratio is correlated with the signature coverage of the origi- 
nal proof (i.e. the ratio of signature symbols to total symbols in the proof), with 
a weak or strong correlation depending on the signature (r between —0.26 and 
—0.74). However, a substantial number of proofs with relatively high signature 
coverage could still not be reduced in size at all (see the top right of the right 
diagram). In summary, we can see that signature-based condensation can be 
useful, but this depends on the proof task and the signature. We also conducted 
experiments on the Galen ontology,’ with comparable results (see the extended 
version of this paper [1]). 


5 https://bioportal.bioontology.org/ontologies/GALEN. 
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6 Conclusion 


We have presented and compared the proof generation and presentation methods 
used in EVONNE, a visual tool for explaining entailments of DL ontologies. While 
these methods produce smaller or less deep proofs, which are thus easier to 
present, there is still room for improvements. Specifically, as the forgetting-based 
proofs do not provide the same degree of detail as the ELK proofs, it would be 
desirable to also support methods for more expressive DLs that generate proofs 
with smaller inference steps. Moreover, our current evaluation focuses on proof 
size and depth—to understand how well EVONNE helps users to understand 
DL entailments, we would also need a qualitative evaluation of the tool with 
potential end-users. We are also working on explanations for non-entailments 
using countermodels [7] and a plugin for the ontology editor Protégé that is 
compatible with the PULi library and Proof Explanation plugin presented in [15], 
which will support all proof generation methods discussed here and more.°® 
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Abstract. We present new results on the application of semantic- and 
knowledge-based reasoning techniques to the analysis of cloud deploy- 
ments. In particular, to the security of Infrastructure as Code configura- 
tion files, encoded as description logic knowledge bases. We introduce an 
action language to model mutating actions; that is, actions that change 
the structural configuration of a given deployment by adding, modifying, 
or deleting resources. We mainly focus on two problems: the problem of 
determining whether the execution of an action, no matter the parame- 
ters passed to it, will not cause the violation of some security requirement 
(static verification), and the problem of finding sequences of actions that 
would lead the deployment to a state where (un)desirable properties are 
(not) satisfied (plan existence and plan synthesis). For all these problems, 
we provide definitions, complexity results, and decision procedures. 


1 Introduction 


The use of automated reasoning techniques to analyze the properties of cloud 
infrastructure is gaining increasing attention [4—7,18]. Despite that, more effort 
needs to be put into the modeling and verification of generic security require- 
ments over cloud infrastructure pre-deployment. The availability of formal tech- 
niques, providing strong security guarantees, would assist complex system-level 
analyses such as threat modeling and data flow, which now require considerable 
time, manual intervention, and expert domain knowledge. 

We continue our research on the application of semantic-based and 
knowledge-based reasoning techniques to cloud deployment Infrastructure as 
Code configuration files. In [14], we reported on our experience using expressive 
description logics to model and reason about Amazon Web Services’ proprietary 
Infrastructure as Code framework (AWS CloudFormation). We used the rich 
constructs of these logics to encode domain knowledge, simulate closed-world 
reasoning, and express mitigations and exposures to security threats. Due to the 
high complexity of basic tasks [3,26], we found reasoning in such a framework 
to be not efficient at cloud scale. In [15], we introduced core-closed knowledge 
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bases—a lightweight description logic combining closed- and open-world rea- 
soning that is tailored to model cloud infrastructure and efficiently query its 
security properties. Core-closed knowledge bases enable partially-closed predi- 
cates whose interpretation is closed over a core part of the knowledge base but 
open elsewhere. To encode potential exposure to security threats, we studied the 
query satisfiability problem and (together with the usual query entailment prob- 
lem) applied it to a new class of conjunctive queries that we called Must/May 
queries. We were able to answer such queries over core-closed knowledge bases in 
LOGSPACE in data complexity and NP in combined complexity, improving the 
required NEXPTIME complexity for satisfiability over ACCOTQ (used in [14]). 

Here, we enhance the quality of the analyses done over pre-deployment arti- 
facts, giving users and practitioners additional precise insights on the impact 
of potential changes, fixes, and general improvements to their cloud projects. 
We enrich core-closed knowledge bases with the notion of core-completeness, 
which is needed to ensure that updates are consistent. We define the syntax and 
semantics of an action language that is expressive enough to encode mutating 
API calls, i.e., operations that change a cloud deployment configuration by cre- 
ating, modifying, or deleting existing resources. As part of our effort to improve 
the quality of automated analysis, we also provide relevant reasoning tools to 
identify and predict the consequences of these changes. To this end, we consider 
procedures that determine whether the execution of a mutating action always 
preserves given properties (static verification); determine whether there exists a 
sequence of operations that would lead a deployment to a configuration meet- 
ing certain requirements (plan existence); and find such sequences of operations 
(plan synthesis). 

The paper is organized as follows. In Sect. 2, we provide background on core- 
closed knowledge bases, conjunctive queries, and Must/May queries. In Sect. 3, 
we motivate and introduce the notion of core-completeness. In Sect.4, we define 
the action language. In Sect.5, we describe the static verification problem and 
characterize its complexity. In Sect.6, we address the planning problem and 
concentrate on the synthesis of minimal plans satisfying a given requirement 
expressed using Must/May queries. We discuss related works in Sect.7 and 
conclude in Sect. 8. Results and proofs that are omitted in this paper are found 
in the full version [16]. 


2 Background 


Description logics (DLs) are a family of logics for encoding knowledge in terms of 
concepts, roles, and individuals; analogous to first-order logic unary predicates, 
binary predicates, and constants, respectively. Standard DL knowledge bases 
(KBs) have a set of axioms, called TBox, and a set of assertions, called A Boz. 
The TBox contains axioms that relate to concepts and roles. The ABox contains 
assertions that relate individuals to concepts and pairs of individuals to roles. 
KBs are usually interpreted under the open-world assumption, meaning that the 
asserted facts are not assumed to be complete. 
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Core-Closed Knowledge Bases. In [15], we introduced core-closed knowledge 
bases (ccKBs) as a suitable description logic formalism to encode cloud deploy- 
ments. The main characteristic of ccKBs is to allow for a combination of open- 
and closed-world reasoning that ensures tractability. A DL-Lite” ccKB is the 
tuple K = (T,A,S,M) built from the standard knowledge base (7, A) and the 
core system (S,M). The former encodes incomplete terminological and asser- 
tional knowledge. The latter is, in turn, composed of two parts: S (also called 
the SBox), containing axioms that encode the core structural specifications, 
and M (also called the MBoz), containing positive concept and role assertions 
that encode the core configuration. Syntactically, M is similar to an ABox but, 
semantically, is assumed to be complete with respect to the specifications in S. 

The ccKB K is defined over the alphabets C (of concepts), R (of roles), and I 
(of individuals), all partitioned into an open subset and a partially-closed subset. 
That is, the set of concepts is partitioned into the open concepts C% and the 
closed (specification) concepts CS; the set of roles is partitioned into open roles 
RX and closed (specification) roles RS; and the set of individuals is partitioned 
into open individuals I“ and closed (model) individuals IM. We call C5 and RS 
core-closed predicates, or partially-closed predicates, as their extension is closed 
over the core domain IM and open otherwise. In contrast, we call C¥ and R* 
open predicates. The syntax of concept and role expressions in DL-Lite* [2,8] 
is as follows: 


B::= L|A| 4p 


where A denotes a concept name and p is either a role name r or its inverse r`. 
The syntax of axioms provides for the three following axioms: 


Bi E B? B! Cc 4B? 


: (funct p), 


respectively called: positive inclusion axioms, negative inclusion axioms, and 
functionality axioms. These axioms are contained in the sets S and 7T. To pre- 
cisely denote the subsets of S and 7 having only axioms of a given type we use 
the notation Ply, NIx, and Fy, for ¥ € {S,T}, which respectively contain only 
positive inclusion axioms, negative inclusion axioms, and functionality axioms. 
From now on, we denote symbols from the alphabet X* with the subscript 
X, and symbols from the generic alphabet X with no subscript. In core-closed 
knowledge bases, axioms and assertions fall into the scope of a different set 
depending on the predicates and individuals that they refer to, according to the 
set definitions below. 


M C {As(am), Rs(am,a), Rs(a,am)} 

AC {Ax(ax), Re (ax, ox), As(ax), Rs(ax, bx)} 
SC {B} E BŻ, B4 C -B4, Func(Ps)} 

T C {B' CB, B'C-Bz, Func(Px)} 


In the above definition of the set M, role assertions link at least one individual 
from the core domain IM (denoted as am) to one individual from the general set 
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I (denoted as a). Node a could either be an individual from the open partition I% 
or the closed partition IM. When a is an element from the set I‘, we refer to it 
as a “boundary node”, as it sits at the boundary between the core and the open 
parts of the knowledge base. As mentioned earlier, M-assertions are assumed to 
be complete and consistent with respect to the terminological knowledge given 
in S; whereas the usual open-world assumption is made for A-assertions. The 
semantics of a DL-Lite? core-closed KB is given in terms of interpretations T, 
consisting of a non-empty domain A? and an interpretation function -7. The 
latter assigns to each concept A a subset A? of A7, to each role r a subset r? of 
A? x A7, and to each individual a a node a? in A7, and it is extended to concept 
expressions in the usual way. An interpretation Z is a model of an inclusion axiom 
Bı C Bo if Bi C BZ . An interpretation Z is a model of a membership assertion 
A(a), (resp. r(a,b)) if a? € A? (resp. (a7, b7) € r7). We say that Z models 7, 
S, and A if it models all axioms or assertions contained therein. We say that T 
models M, denoted Z WA M, when it models an M-assertion f if and only 
if feEM. Finally, Z models K if it models 7, S, A, and M. When K has at 
least one model, we say that Ķ is satisfiable. 

In the remainder of this paper, we will sometimes refer to the lts interpreta- 
tion of M. The lts interpretation of M, denoted lts( M), is the interpretation 
(Alts(M) .lts(M)) defined only over concept and role names from the set CS and 
RS, respectively, and over individual names from IX that appear in the scope 
of M-assertions. The interpretation lts( M) is the unique model of M such that 
lts(M) HA M. 

In the application presented in [14], description logic KBs are used to encode 
machine-readable deployment files containing multiple resource declarations. 
Every resource declaration has an underlying tree structure, whose leaves can 
potentially link to the roots of other resource declarations. Let I” C IM be the 
set of all resource nodes, we encode their resource declarations in M, and for- 
malize the resulting forest structure by partitioning M into multiple subsets 
{M;i bicer, each representing a tree of assertions rooted at a resource node i (we 
generally refer to constants in M as nodes). For the purpose of this work, we 
will refer to core-closed knowledge bases where M is partitioned as described; 
that is, ccKBs such that K = (7,A,S,{Mzh}ier). 


Conjunctive Queries. A conjunctive query (CQ) is an existentially-quantified 
formula q[Z] of the form Jy.conj(z, y), where conj is a conjunction of positive 
atoms and potentially inequalities. A union of conjunctive queries (UCQ) is a 
disjunction of CQs. The variables in # are called answer variables, those in 7 
are the existentially-quantified query variables. A tuple € of constants appearing 
in the knowledge base K is an answer to q if for all interpretations Z model 
of K we have T | q[é. We call these tuples the certain answers of q over K, 
denoted ans(K, q), and the problem of testing whether a tuple is a certain answer 
query entailment. A tuple € of constants appearing in K satisfies q if there exists 
an interpretation Z model of K such that Z | gq[c]. We call these tuples the sat 
answers of q over K, denoted sat—ans(K, q), and the problem of testing whether 
a given tuple is a sat answer query satisfiability. 
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Must/May Queries. A Must/May query w [15] is a Boolean combination of 
nested UCQs in the scope of a MUST or a May operator as follows: 


w= ap | pi Ay2| Yı VyY2| MUST y| May p% 


where y and yx are unions of conjunctive queries potentially containing inequal- 
ities. The reasoning needed for answering the nested queries can be decou- 
pled from the reasoning needed to answer the higher-level formula: nested 
queries MUST y are reduced to conjunctive query entailment, and nested queries 
May yz are reduced to conjunctive query satisfiability. We denote by ANS(zv, K) 
the answers of a MusT/May query w over the core-closed knowledge base K. 


3 Core-Complete Knowledge Bases 


The algorithm Consistent presented in [15] computes satisfiability of DL-Lite7 
core-closed knowledge bases relying on the assumption that M is complete and 
consistent with respect to S. Such an assumption effectively means that the infor- 
mation contained in M is explicitly present and cannot be completed by inference. 
The algorithm relies on the existence of a theoretical object, the canonical inter- 
pretation, in which missing assertions can always be introduced when they are 
logically implied by the positive inclusion axioms. As a matter of fact, positive 
inclusion axioms are not even included in the inconsistency formula built for 
the satisfiability check, as it is proven that the canonical interpretation always 
satisfies them ([15], Lemma 3). When the assumption that M is consistent with 
respect to S is dropped, the algorithm Consistent becomes insufficient to check 
satisfiability. We illustrate this with an example. 


Example 1 (Required Configuration). Let us consider the axioms constraining 
the AWS resource type S$3::Bucket. In particular, the S-axiom S3::Bucket C 
SloggingConfiguration prescribing that all buckets must have a required log- 
ging configuration. For a set M = {S3::Bucket(b)}, according to the partially- 
closed semantics of core-closed knowledge bases, the absence of an assertion 
loggingConfiguration(b, x), for some 2, is interpreted as the assertion being false 
in M, which is therefore not consistent with respect to S. However, the algo- 
rithm Consistent will check the lts interpretation of M for an empty formula (as 
there are no negative inclusion or functionality axioms) and return true. 


In essence, the algorithm Consistent does not compute the full satisfiability of the 
whole core-closed knowledge base, but only of its open part. Satisfiability of M 
with respect to the positive inclusion axioms in S needs to be checked separately. 
We introduce a new notion to denote when a set M is complete with respect 
to S that is distinct from the notion of consistency. Let K = (T,A,S,M) bea 
DL-Lite* core-closed knowledge base; we say that K is core-complete when M 
models all positive inclusion axioms in S under a closed-world assumption; we 
say that K is open-consistent when M and A model all negative inclusion and 
functionality axioms in K’s negative inclusion closure. Finally, we say that K is 
fully satisfiable when is both core-complete and open-consistent. 
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Lemma 1. In order to check full satisfiability of a DL-Lite? core-closed KB, 
one simply needs to check if K is core-complete (that is, if M models all positive 
axioms in S under a closed-world assumption) and if K is open-consistent (that 
is, to run the algorithm Consistent ). 


Proof. Dropping the assumption that M is consistent w.r.t. S causes Lemma 3 
from [15] to fail. In particular, the canonical interpretation of K, can(C), would 
still be a model of Plz, A, and M, but may not be a model of PIs. This is 
due to the construction of the canonical model that is based on the notion of 
applicable axioms. In rules c5-c8 of [15] Definition 1, axioms in PIs are defined 
as applicable to assertions involving open nodes ax but not to model nodes 
am in IM. As a result, if the implications of such axioms on model nodes are 
not included in M itself, then they will not be included in can(K) either, and 
can(K) will not be a model of PIs. On the other hand, one can easily verify 
that Lemmas 1,2,4,5,6,7 and Corollary 1 would still hold as they do not rely on 
the assumption. However, since it is not guaranteed anymore that M satisfies 
all positive inclusion axioms from S, the if direction of [15] Theorem 1 does not 
hold anymore: there can be an unsatisfiable ccKB K such that db(A)Ults(M) H 
cln(T US),A,M. For instance, the knowledge base from Example 1. We also 
note that the negative inclusion and functionality axioms from S will be checked 
anyway by the consistency formula, both on db(A) and on Its(M). 


Lemma 2. Checking whether a DL-Lite? core-closed knowledge base is core- 
complete can be done in polynomial time in M. As a consequence, checking full 
satisfiability is also done in polynomial time in M. 


Proof. One can write an algorithm that checks core-completeness by searching 
for the existence of a positive inclusion axiom Bs C BZ € PIs such that M | 
B} (am) and M j B2(aq), where the relation — is defined over DL-Lite™ 
concept expressions as follows: 


MEL(am) << false 

MFAs(am) = As(am)eEM 
MFars(am) = db. rslam,b) EM 
M H3rz(lam) = db. rs(b,am)eM. 


The knowledge base is core-complete if such a node cannot be found. 


4 Actions 


We now introduce a formal language to encode mutating actions. Let us remind 
ourselves that, in our application of interest, the execution of a mutating action 
modifies the configuration of a deployment by either adding new resource 
instances, deleting existing ones, or modifying their settings. Here, we intro- 
duce a framework for DL-Litef core-closed knowledge base updates, triggered 
by the execution of an action that enables all the above mentioned effects. The 
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only component of the core-closed knowledge base that is modified by the action 
execution is M; while 7, S, and A remain unchanged. As a consequence of 
updating M, actions can introduce new individuals and delete old ones, thus 
updating the set IM as well. Note that this may force changes outside IM“ due 
to the axioms in 7 and S. The effects of applying an action over M depend 
on a set of input parameters that will be instantiated at execution time, result- 
ing in different assertions being added or removed from M. As a consequence 
of assertions being added, fresh individuals might be introduced in the active 
domain of M, including both model nodes from I“ and boundary nodes from 
I”. Differently, as a consequence of assertions being removed, individuals might 
be removed from the active domain of M, including model nodes from I but 
not including boundary nodes from I”. In fact, boundary nodes are owned by 
the open portion of the knowledge base and are known to exist regardless of them 
being used in M. We invite the reader to review the set definitions for A- and 
M-assertions (Sect. 2) to note that it is indeed possible for a generic boundary 
individual a involved in an M-assertion to also be involved in an A-assertion. 


4.1 Syntax 


An action is defined by a signature and a body. The signature consists of an 
action name and a list of formal parameters, which will be replaced with actual 
parameters at execution time. The body, or action effect, can include conditional 
statements and concatenation of atomic operations over M-assertions. For exam- 
ple, let a be the action act(z) = y; that is, the action denoted by signature act(Z) 
and body y, with signature name act, signature parameters 7, and body effect y. 
Since it contains unbound parameters, or free variables, action @ is ungrounded 
and needs to be instantiated with actual values in order to be executed over 
a set M. In the following, we assume the existence of a set Var, of variable 
names, and consider a generic input parameters substitution 6: Var > I, which 
replaces each variable name by an individual node. For simplicity, we will denote 
an ungrounded action by its effect y, and a grounded action by the composition 
of its effect with an input parameter substitution 46. Action effects can either 
be complex or basic. The syntax of complex action effects y and basic effects 3 
is constrained by the following grammar. 


yu=e | B-y | [pw Bl] -¥ 
br=@,5 | Or S | Orpew S | Or 


The complex action effects y include: the empty effect (e€), the execution of 
a basic effect followed by a complex one ( 8- y), and the conditional execution 
of a basic effect upon evaluation of a formula y over the set M ([y ~ 6]-7). 
The basic action effects @ include: the addition of a set S of M-assertions to the 
subset Mz (zS), the removal of a set S of M-assertions from the subset Ms 
(O25), the addition of a fresh subset M,,,.,, containing all the M-assertions in 
the set S (©z,,.,,5 ), and the removal of an existing M, subset in its entirety 
(©, ). The set S, the formula y, and the operators 6/© might contain free 
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variables. These variables are of two types: (1) variables that are replaced by 
the grounding of the action input parameters, and (2) variables that are the 
answer variables of the formula y and appear in the nested effect (. 


Example 2. The following is the definition of the action createBucket from the 
API reference of the AWS resource type $3::Bucket. The input parameters are 
two: the new bucket name “name” and the canned access control list “acl” (one 
of Private, PublicRead, PublicReadWrite, AuthenticatedRead, etc.). The effect of 
the action is to add a fresh subset M, for the newly introduced individual x 
containing the two assertions $3::Bucket(x) and accessControl(z, y). 


createBucket(x : name, y : acl) = ©,{$3::Bucket(x), accessControl(x, y)} -€ 


The action needs to be instantiated by a specific parameter assignment, for 
example the substitution 0 = [| x — DataBucket, y — Private |, which binds 
the variable x to the node DataBucket and the variable y to the node Private, 
both taken from a pool of inactive nodes in I. 


Action Query vy. The syntax introduced in the previous paragraph allows for 
complex actions that conditionally execute a basic effect 8 depending on the 
evaluation of a formula y over M. This is done via the construct [y ~ 8] - +4. 
The formula y might have a set y of answer variables that appear free in its body 
and are then bound to concrete tuples of nodes during evaluation. The answer 
tuples are in turn used to instantiate the free variables in the nested effect 8. 
We call y the action query since we use it to select all the nodes that will be 
involved in the action effect. According to the grammar below, y is a boolean 
combination of M-assertions potentially containing free variables. 


pu=As(t) | Rs(ti,te)| pi Ave | p2 Vp2 | =% 


In particular, As is a symbol from the set CS of partially-closed concepts; 
Rs is a symbol from the set R*° of partially-closed roles; and t,t,,t2 are either 
individual or variable names from the set I W Var, chosen in such a way that 
the resulting assertion is an M-assertion. Since the formula y can only refer 
to M-assertions, which are interpreted under a closed semantics, its evaluation 
requires looking at the content of the set M. A formula ọ with no free variables is 
a boolean formula and evaluates to either true or false. A formula y with answer 
variables 7 and arity ar(y) evaluates to all the tuples f, of size equal the arity of 
y, that make the formula true in M. The free variables of y can only appear in 
the action 8 such that y ~ 8. We denote by ANS(y, M) the set of answers to 
the action query y over M. It is easy to see that the maximum number of tuples 
that could be returned by the evaluation (that is, the size of the set ANS(y, M)) 
is bounded by |IM wI?|2"(%), in turn bounded by (2|M| )?!¥I. 


Example 3. The following example shows the encoding of the S3 API opera- 
tion called deleteBucketEncryption, which requires as unique input parameter 
the name of the bucket whose encryption configuration is to be deleted. Since 
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a bucket can have multiple encryption configuration rules (each prescribing dif- 
ferent encryption keys and algorithms to be used) we use an action query ọ to 
select all the nodes that match the assertions structure to be removed. 


gly, k, z|(x) = $3::Bucket(x) A encrRule(x, y) A SSEKey(y, k) A SSEAlgo(y, z) 


The query ¢ is instantiated by the specific bucket instance (which will replace 
the variable x) and returns all the triples (y,k, z) of encryption rule, key, and 
algorithm, respectively, which identify the assertions corresponding to the dif- 
ferent encryption configurations that the bucket has. The answer variables are 
then used in the action effect to instantiate the assertions to remove from Mg: 


deleteBucketEncryption(a : name) 
= [ply, k, 2](@) ~> Ss{encrRule(z, y), SSEKey(y, k), SSEAlgo(y, z)}] - € 


4.2 Semantics 


So far, we have described the syntax of our action language and provided two 
examples that showcase the encoding of real-world API calls. Now, we define the 
semantics of action effects with respect to the changes that they induce over a 
knowledge base. Let us recall that given a substitution 6 for the input parameters 
of an action y, we denote by at) the grounded action where all the input variables 
are replaced according to what prescribed by 0. Let us also recall that the effects 
of an action apply only to assertions in M and individuals from I, and cannot 
affect nodes and assertions from the open portion of the knowledge base. 

The execution of a grounded action 16 over a DL-Lite” core-closed knowledge 
base K = (T,A,S,M), defined over the set IM of partially-closed individuals, 


generates a new knowledge base KË = (T, A,S, M”), defined over an updated 


set of partially-closed individuals IM”. Let S be a set of M- assertions, y a com- 
plex action, 0 an input parameter substitution, and fa generic substitution that 
potentially replaces all free variables in the action y. Let f1 and pz be two substi- 
tutions with signature Var > I such that dom(p1)Ndom(p2) = 0; we denote their 
composition by J} p2 and define it as the new substitution such that p1 p2(£) = a 
if õ(x)=a V fo(x)=a, and pi po(x) = L if pi(~)=L A p2(x)=L. We formalize 
the application of the grounded action 10 as the transformation 1 that maps 


the pair (M, IM) into the new pair Cou We sometimes use the nota- 


tion T (M) or T aŒ") to refer to the updated MBox or to the updated set of 
model nodes, respectively. The rules for applying the transformation depend on 
the structure of the action y and are reported in Fig. 1. The transformation starts 
with an initial generic substitution p = 6. As the transformation progresses, the 
generic substitution p can be updated only as a result of the evaluation of an 
action query y over M. Precisely, all the tuples ti, ..., în making y true in M 
will be considered and composed with the current substitution J generating n 
fresh substitutions pt1, ..., Ptn which are used in the subsequent application of 
the nested effect @. Since the core M of the knowledge base K changes at every 
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action execution, its domain of model nodes I“ changes as well. The execution 
of an action 7 over the knowledge base K = (T,A,S5,M) with set of model 


nodes IM could generate a new KË = (T, A, S, MP) with a new set of model 


nodes IM” that is not core-complete or not open-consistent (see Sect.3 for the 
corresponding definitions). We illustrate two examples next. 


Tex(M,1™) =(M,1™) 
Tayal M, I”) =Typ( Tap(M, I") ) 
Typ(Tap(M,I™) ) if ANS(y, M) = tt 
Tieg) M,I™) = 4 Typ(M, I") if ANS(y, M) = 0 or ff 
Tya(Tppe,....ear,(M,1™)) if rein = {t;n tn} 


-o D =( {Mihizza) U {Mza US} , IM Uind(Sp) ) 
Tossp(M, I”) =( {Mi}igate) U {Macey N Sa} , T N ind( 5s) ) 
a in =(MU{Mg F I™ Uind(S;) ) 

(MI) =( 


M oe M(x) 5 si ind(Mzaæ)) ) 


Fig. 1. Semantic of the action language defined over the MBox M and set I™. 


Example 4 (Violation of core-completeness). Consider the case where the gen- 
eral specifications of the system require all objects of type bucket to have a log- 
ging configuration, and an action that removes the logging configuration from 
a bucket. Consider the core-closed knowledge base K where S = {S3::Bucket E 
FloggingConfiguration} and M = {S3::Bucket(b), loggingConfiguration(, c)} (con- 
sistent wrt S) and the action y defined as 


deleteLoggingConfiguration(x : name) 
= [(y[y](x) = $3::Bucket(x) A loggingConfiguration(z, y)) 
~> ©, {loggingConfiguration(z, y) }] - € 


For the input parameter substitution @ = [x < b], it is easy to see that the 


transformation T g applied to M results in the update MË = {S3::Bucket(b)}, 
which is not core- Complete. 


Example 5 (Violation of open-consistency). Consider the case where an action 
application indirectly affects boundary nodes and their properties, leading to 
inconsistencies in the open portion of the knowledge base. For example, when 
the knowledge base prescribes that buckets used to store logs cannot be pub- 
lic; however, a change in the configuration of a bucket instance causes a sec- 
ond bucket (initially known to be public) to also become a log store. In 
particular, this happens when the knowledge base K contains the T-axiom 
JloggingDestination™ E —PublicBucket and the A-assertion PublicBucket(b), and 
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we apply an action that introduces a new bucket storing its logs to b, defined as 
follows: 


createBucketWithLogging(a : name, y : log) 
= ©7{S$3::Bucket(x), loggingDestination(x, y) } 


For the input parameter substitution @ = [x — newBucket, y — b], the result 
of applying the transformation T\ is the set M = {S3::Bucket(new Bucket), 
loggingDestination(new Bucket, b)} which, combined with the pre-existing and 
unchanged sets 7 and A, causes the updated KË to be not open-consistent. 


From a practical point of view, the examples highlight the need to re-evaluate 
core-completeness and open-consistency of a core-closed knowledge base after 
each action execution. Detecting a violation to core-completeness signals that we 
have modeled an action that is inconsistent with respect to the systems specifi- 
cations, which most likely means that the action is missing something and needs 
to be revised. Detecting a violation to open-consistency signals that our action, 
even when consistent with respect to the specifications, introduces a change that 
conflicts with other assumptions that we made about the system, and generally 
indicates that we should either revise the assumptions or forbid the application 
of the action. Both cases are important to consider in the development life cycle 
of the core-closed KB and the action definitions. 


5 Static Verification 


In this section, we investigate the problem of computing whether the execution of 
an action, no matter the specific instantiation, always preserves given properties 
of core-closed knowledge bases. We focus on properties expressed as Must/May 
queries and define the static verification problem as follows. 


Definition 1 (Static Verification). Let K be a DL-Lite* core-closed knowl- 
edge base, q be a MusT/MAy query, and y be an action with free variables from 
the language presented above. Let 0 be an assignment for the input variables of 
y that transforms y into the grounded action 46. Let KË be the DL-Lite* core- 
closed knowledge base resulting from the application of the grounded action y 
onto K. We say that the action y “preserves q over K” iff for every grounded 
instance 10 we have that ANS(q,K) = ANS(q, K). The static verification prob- 
lem is that of determining whether an action y is q-preserving over K. 


An action y is not g-preserving over K iff there exists a _grounding Ë for 
the input variables of y such that ANS(q,K) Æ ANS(q, KP); that is, fixed 
the grounding 6 there exists a tuple f for q’s answer variables such that 
f € ANS(q,K) N ANS(q, 7) or E€ ANS(q, KP) N ANS(q, K). 


Theorem 1 (Complexity of the Static Verification Problem). The static 
verification problem, 1.e.deciding whether an action y is q-preserving over K, can 
be decided in P'TIME in data complexity and EXPTIME in the arities of y and q. 
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Proof. The proof relies on the fact that one could: enumerate all possible assign- 
ments 6: compute the updated knowledge bases 7°; check whether these are 
fully satisfiable; enumerate all tuples ¢ for the query q; and, finally, check whether 
there exists at least one such tuple that satisfies q over K but not K7 or vice 
versa. The number of assignments @ is bounded by (IIM wIX|+ ar(y))" as it 
is sufficient to replace each variable appearing in the action y either by a known 
object from IM wI* or by a fresh one. The computation of the updated K7? is 
done in polynomial time in M (and is exponential in the size of the action 7) as 
it may require the evaluation of an internal action query y and the consecutive 
re-application of the transformation for a number of tuples that is bounded by a 
polynomial over the size of M. As explained in Sect. 3, checking full satisfiability 
of the resulting core-closed knowledge base is also polynomial in M. The number 
of tuples t is bounded by (|IM w IX| + ar())" as it is enough to consider 
all those tuples involving known objects plus the fresh individuals introduced 
by the assignment 6. Checking whether a tuple f satisfies the query q over a 
core-closed knowledge base is decided in LOGSPACE in the size of M [15] which 
is, thus, also polynomial in M. 


6 Planning 


As discussed throughout the paper, the execution of a mutating action modi- 
fies the configuration of a deployment and potentially changes its posture with 
respect to a given set of requirements. In the previous two sections, we intro- 
duced a language to encode mutating actions and we investigated the problem 
of checking whether the application of an action preserves the properties of a 
core-closed knowledge base. In this section, we investigate the plan existence 
and synthesis problems; that is, the problem of deciding whether there exists 
a sequence of grounded actions that leads the knowledge base to a state where 
a certain requirement is met, and the problem of finding a set of such plans, 
respectively. We start by defining a notion of transition system that is gen- 
erated by applying actions to a core-closed knowledge base and then use this 
notion to focus on the mentioned planning problems. As in classical planning, 
the plan existence problem for plans computed over unbounded domains is unde- 
cidable [17,19]. The undecidability proof is done via reduction from the Word 
problem. The problem of deciding whether a deterministic Turing machine M 
accepts a word w € {0,1}* is reduced to the plan existence problem. Since unde- 
cidability holds even for basic action effects, we can show undecidability over an 
unbounded domain by using the same encoding of [1]. 


Transition Systems. In the style of the work done in [10,21], the combination 
of a DL-Lite? core-closed knowledge base and a set of actions can be viewed 
as the transition system it generates. Intuitively, the states of the transition 
system correspond to MBoxes and the transitions between states are labeled by 
grounded actions. A DL-Lite* core-closed knowledge base K = (T,A,S,Mo), 
defined over the possibly infinite set of individuals I (and model nodes Ij C I) 
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and the set Act of ungrounded actions, generates the transition system (TS) Yk = 
(I1,7,A,S,5°,Mo,—) where X is a set of fully satisfiable (i.e., core-complete and 
open-consistent) MBoxes; Mg is the initial MBox; and >C X x Laa x X isa 
labeled transition relation with Lay the set of all possible grounded actions. 
The sets X and — are defined by mutual induction as the smallest sets such 
that: if M; € X then for every grounded action y E Lac such that the fresh 
MBox Mi+ı resulting from the transformation Të is core-complete and open- 


consistent, we have that Mi+ı € X and (Mi, 70, Mi41) E>. 

Since we assume that actions have input parameters that are replaced during 
execution by values from I, which contains both known objects from IM w IX 
and possibly infinitely many fresh objects, the generated transition system Yk is 
generally infinite. To keep the planning problem decidable, we concentrate on a 
known finite subset D C I containing all the fresh nodes and value assignments to 
action variables that are of interest for our application. In the remainder of this 
paper, we discuss the plan existence and synthesis problem for finite transition 
systems Yk = (D, T, A, S, X, Mo,—), whose states in X have a domain that is 
also bounded by D. 


The Plan Existence Problem. A plan is a sequence of grounded actions whose 
execution leads to a state satisfying a given property. Let K = (T, A,S,Mo) 
be a DL-Litef core-closed knowledge base; Act be a set of ungrounded actions; 
and let Yk = (D,T,A,S, X, Mo, —) be its generated finite TS. Let m be a finite 


sequence 6) --- nn of grounded actions taken from the set Lact. We call the 


sequence 7 consistent iff there exists a run p = Mo a Mı Glar a lata My 
in Nc. Let q be a MustT/May query mentioning objects from adom(K) and t a 
tuple from the set adom(K)*". A consistent sequence 7 of grounded actions 
is a plan from K to (E, q) iff F€ ANS(q,K, = (T,A,S,M,,)) with Mn the final 
state of the run induced by 7. 


Definition 2 (Plan Existence). Given a DL-Lite* core-closed knowledge base 
K, a tuple t, and a MusT/May query q, the plan existence problem is that of 
deciding whether there exists a plan from K to (t,q). 


Example 6. Let us consider the transition system Yk generated by the core- 
closed knowledge base K = (7,A,S,Mo) having the set of partially-closed 
assertions Mo defined as 


{S3::Bucket(b), KMS::Key(k), bucketEncryptionRule(b, r), bucketKey(r, k), 
bucketKeyEnabled(r, true), enableKeyRotation(k, false) } 


and the set of action labels Act containing the actions deleteBucket, createBucket, 
deleteKey, createKey, enableKeyRotation, putBucketEncryption, and deleteBucke- 
tEncryption. Let us assume that we are interested in verifying the existence of a 
sequence of grounded actions that when applied onto the knowledge base would 
configure the bucket node b to be encrypted with a rotating key. Formally, this 
is equivalent to checking the existence of a consistent plan m that when executed 
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on the transition system Yc leads to a state Mp such that the tuple ¢=bisin 
the set ANS(q, Kn = (T,A,S,Mz,)) for q the query 


qlz] = S3::Bucket(z) A Must (3y, z. bucketSSEncryption(x, y) A 
bucketKey(y, z) A enableKeyRotation(z, true)) 


It is easy to see that the following three sequences of grounded actions are 
valid plans from K to (b, q): 


Tı = enableKeyRotation(k) 

Tz = createKey(k,) - enablekeyRotation(k,) - putBucketEncryption(b, kı) 

T3 = deleteBucketEncryption(b, k) - createKey(k,) - enableKeyRotation(k;)- 
putBucketEncryption(b, k1) 


If, for example, a bucket was only allowed to have one encryption (by means 
of a functional axiom in S), then 72 would not be a valid plan, as it would 
generate an inconsistent run leading to a state M; that is not open-consistent 
wrt. S. 


Lemma 3. The plan existence problem for a finite transition system Yk gener- 
ated by a DL-Lite? core-closed knowledge base K and a set of actions Act, over 
a finite domain of objects D, reduces to graph reachability over a graph whose 
number of states is at most exponential in the size of D. 


The Plan Synthesis Problem. We now focus on the problem of finding plans 
that satisfy a given condition. As discussed in the previous paragraph, we are 
mostly driven by query answering; in particular, by conditions corresponding 
to a tuple (of objects from our starting deployment configuration) satisfying a 
given requirement expressed as a MustT/May query. Clearly, this problem is 
meaningful in our application of interest because it corresponds to finding a set 
of potential sequences of changes that would allow one to reach a configuration 
satisfying (resp., not satisfying) one, or more, security mitigations (resp., vul- 
nerabilities). We concentrate on DL-Lite* core-closed knowledge bases and their 
generated finite transition systems, where potential fresh objects are drawn from 
a fixed set D. We are interested in sequences of grounded actions that are min- 
imal and ignore sequences that extend these. We sometimes call such minimal 
sequences simple plans. A plan a from an initial core-closed knowledge base K 
to a goal condition b is minimal (or simple) iff there does not exist a plan 7’ 
(from the same initial K to the same goal condition b) s.t. 7 = m’ - ø, for o a 
non-empty suffix of grounded actions. 

In Algorithm 1, we present a depth-first search algorithm that, starting from 
K, searches for all simple plans that achieve a given target query membership 
condition. The transition system Yk is computed, and stored, on the fly in the 
Successors sub-procedure and the graph is explored in a depth-first search traver- 
sal fashion. 
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Algorithm 1: FindPlans(K, D, Act, (t, q)) 


Inputs : A ccKB K = (T, A, S, Mo), a domain D, a set of actions Act 


and a pair (t, q) of an answer tuple and a MUST/MAY query 


Output: A possibly empty set JI of consistent simple plans 
def FindPlans (K, D, Act, (t, q)): 


IT := 6; 
S:= 1; 
AllPlanSearch(Mo, €, 0, K, D, Act, (#,¢)) ; 


return IT; 


def Al1PlanSearch (.M,7,V,K,D, Act, (t, q)): 


if M € V then 
return; 


if te ANS(q, (T, A, S, M)}) then 
IT := 11 U {7}; 
return; 


Q := 90; 
foreach (8, M') € Successors(M, Act, D) do 
| Q-push( (að; M’)); 
V :=VU{M}; 
while Q #9 do 
(16,.M') = Q.pop(); 
AllPlanSearch(M', 7 - 70, V, K, D, Act, (Ë, qY); 
V :=V\{M}; 


return; 


def Successors (M, Act, D): 


if S[M] is defined then 

| return S[M]; 

N := 96; 

foreach y € Act, 0 € DO) do 

M := T (M); 

if M'is fully satisfiable then 
| N= NU {(78, M'Y} 


S[|M] := N; 


return N; 


We note that the condition f € ANS(q,(T,A,S,M)) (line 9) could be 
replaced by any other query satisfiability condition and that one could easily 
rewrite the algorithm to be parameterized by a more general boolean goal. For 
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example, the condition that a given tuple f is not an answer to a query q over 
the analyzed state, with the query q representing an undesired configuration, 
or a boolean formula over multiple query membership assertions. We also note 
that Algorithm 1 could be simplified to return only one simple plan, if a plan 
exists, or NULL, if a plan does not exist, thus solving the so-called plan generation 
problem. We refer the reader to the full version of this paper [16] containing the 
plan generation algorithm (full version, Appendix A.1) and the proofs of Theo- 
rem 2 and 3 below (full version, Appendices A.2 and A3, respectively). 


Theorem 2 (Minimal Plan Synthesis Correctness). Let K be a DL-Lite” 
core-closed knowledge base, D be a fixed finite domain, Act be a set of ungrounded 
action labels, and (t, q) be a goal. Then a plan m is returned by the algorithm 
FindPlans(K, D, Act, (t, q)) if and only if x is a minimal plan from K to (t, q)- 


Theorem 3 (Minimal Plan Synthesis Complexity). The FindPlans algo- 
rithm runs in polynomial time in the size of M and exponential time in size 


of D. 


7 Related Work 


The syntax of the action language that we presented in this paper is similar to 
that of [1, 12,13]. Differently from their work, we disallow complex action effects 
to be nested inside conditional statements, and we define basic action effects that 
consist purely in the addition and deletion of concept and role M-assertions. 
Thus, our actions are much less general than those used in their framework. 
The semantics of their action language is defined in terms of changes applied to 
instances, and the action effects are captured and encoded through a variant of 
ALCHOT 9 called ACCHOT Qir. In our work, instead, the execution of an action 
updates a portion of the core-closed knowledge base K—the core M, which is 
interpreted under a close-world assumption and can be seen as a partial assign- 
ment for the interpretations that are models of K. Since we directly manipulate 
M, the semantics of our actions is more similar to that of [21] and, in general, to 
ABox updates [22,23]. Like the frameworks introduced in [9-11, 20], our actions 
are parameterized and when combined with a core-closed knowledge base gener- 
ate a transition system. In [11], the authors focus on a variant of Knowledge and 
Action Bases [21] called Explicit-Input KABs (eKABs); in particular, on finite 
and on state-bounded eKABs, for which planning existence is decidable. Our 
generated transition systems are an adaptation of the work done in Description 
Logic based Dynamic Systems, KABs, and eKABs to our setting of core-closed 
knowledge bases. In [24], the authors address decidability of the plan existence 
problem for logics that are subset of ACCOT. Their action language is similar 
to the one presented in this paper; including pre-conditions, in the form of a 
set of ABox assertions, post-conditions, in the form of basic addition or removal 
of assertions, concatenation, and input parameters. In [11], the plan synthesis 


Actions over Core-Closed Knowledge Bases 297 


problem is discussed also for lightweight description logics. Relying on the FOL- 
reducibility of DL-Lite“, it is shown that plan synthesis over DL-Lite4 can be 
compiled into an ADL planning problem [25]. This does not seem possible in our 
case, as not all necessary tests over core-closed knowledge bases are known to be 
FOL-reducible. In [10] and [9], the authors concentrate on verifying and synthe- 
sizing temporal properties expressed in a variant of -calculus over description 
logic based dynamic systems, both problems are relevant in our application sce- 
nario and we will consider them in future works. 


8 Conclusion 


We focused on the problem of analyzing cloud infrastructure encoded as descrip- 
tion logic knowledge bases combining complete and incomplete information. 
From a practical standpoint, we concentrated on formalizing and foreseeing the 
impact of potential changes pre-deployment. We introduced an action language 
to encode mutating actions, whose semantics is given in terms of changes induced 
to the complete portion of the knowledge base. We defined the static verifica- 
tion problem as the problem of deciding whether the execution of an action, no 
matter the specific parameters passed, always preserves a set of properties of 
the knowledge base. We characterized the complexity of the problem and pro- 
vided procedural steps to solve it. We then focused on three formulations of the 
classical AI planning problem: namely, plan existence, generation, and synthesis. 
In our setting, the planning problem is formulated with respect to the transi- 
tion system arising from the combination of a core-closed knowledge base and 
a set of actions; goals are given in terms of one, or more, Must/May conjunc- 
tive query membership assertion; and plans of interest are simple sequences of 
parameterized actions. 
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Abstract. Our goal is to develop a logic-based component for hybrid — 
machine learning plus logic - commonsense question answering systems. 
The paper presents an implementation GK of default logic for handling 
rules with exceptions in unrestricted first order knowledge bases. GK is 
built on top of our existing automated reasoning system with confidence 
calculation capabilities. To overcome the problem of undecidability of 
checking potential exceptions, GK performs delayed recursive checks with 
diminishing time limits. These are combined with the taxonomy-based 
priorities for defaults and numerical confidences. 


1 Introduction 


The problem of handling uncertainty is one of the critical issues when considering 
the use of logic for automating commonsense reasoning. Most of the facts and 
rules people use in their daily lives are uncertain. There are many types of 
uncertainty, like fuzziness (is a person somewhat tall or very tall), confidence 
(how certain does some fact seem) and exceptions (birds can typically fly, but 
penguins, ostriches etc., can not). Some of these uncertainties, like fuzziness 
and confidence, can be represented numerically, while others, like rules with 
exceptions, are discrete. In [18] we present the design and implementation of 
the CONFER framework for extending existing automated reasoning systems 
with confidence calculation capabilities. In the current paper we present the 
implementation called GK for default logic [13], built by further extending the 
CONFER implementation. Importantly, we design a novel practical framework 
for implementing default logic for the full, undecidable first order logic on the 
basis of a conventional resolution prover. 


1.1 Default Logic 


Default logic was introduced in 1980 by R. Reiter [13] to model one aspect of 
common-sense reasoning: rules with exceptions. It has remained one of the most 
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well-known logic-based mechanisms devoted to this goal, with the circumscrip- 

tion by J. McCarthy and the autoepistemic logic being the early alternatives. 

Several similar systems have been proposed later, like defeasible logic [11]. 
Default logic [13] extends classical logic with default rules of the form 


a(x) : Bils), ...Bn(x) 
q(x) 


where a precondition a(x), justifications Bı(x), ...Bn(x) and a consequent y(x) 
are first order predicate calculus formulas whose free variables are among 
£ = T1, ..., Um. For every tuple of individuals t = t1, ..., tn, if the precondition a(t) 
is derivable and none of the negated justifications =((t) are derivable from a given 
knowledge base KB, then the consequent y(t) can be derived from KB. Differ- 
ently from classical and most other logics, default logic is non-monotonic: adding 
new assumptions can make some previously derivable formulas non-derivable. 

As investigated in [7], the interpretation of quantifiers in default rules can 
lead to several versions of default logic. We follow the original interpretation 
of Reiter in [13] which requires the use of Skolemization in a specific manner 
over default rules. For example, a default rule: 3xP(x) F JxP(a) should be 
interpreted as : P(c) + P(c), where c is a Skolem constant. 

Consider a typical example for default logic: birds can normally fly, but pen- 
guins cannot fly. The classical logic part 


penguin(p) & bird(b) & Va.penguin(x) => bird(x) & Vx.penguin(x) > afly(2). 


is extended with the default rule bird(x) : fly(x) + fly(a). From here we can 
derive that an arbitrary bird b can fly, but a penguin p cannot. The default 
rule cannot be applied to p, since a contradiction is derivable from fly(p). This 
argument cannot be easily modelled using numerical confidences: the probability 
of an arbitrary living bird being able to fly is relatively high, while the penguins 
form a specific subset of birds, for which this probability is zero. 

Another well-known example — Nixon’s triangle — introduces the prob- 
lem of multiple extensions and sceptical vs credulous entailment: the classical 
facts republican(nixon) & quaker(nixon) extended with two mutually exclud- 
ing default rules republican(x) : apacifist(x) + —pacifist(2) and quaker() : 
pacifist(x) + pacifist(a). The credulous entailment allows giving different priori- 
ties to the default rules and accepts different sets (extensions) of consequences, if 
there is a way to assign priorities so that all the consequences in an extension can 
be derived. The sceptical entailment requires that a consequence is present in all 
extensions. GK follows the latter interpretation, but allows explicit priorities to 
be assigned to the default rules. 

The concept of priorities for default rules has been well investigated, with 
several mechanisms proposed. G. Brewka argues in [4] that “for realistic applica- 
tions involving default reasoning it is necessary to reason about the priorities of 
defaults” and introduces an ordering of defaults based on specificity: default rules 
for a more specific class of objects should take priority over rules for more gen- 
eral classes. For example, since birds (who typically do fly) are physical objects 
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and physical objects typically do not fly, we have contradictory default rules 
describing the flying capability of arbitrary birds. Since birds are a subset of 
physical objects, the flying rule of birds should have a higher priority than the 
non-flying rule of physical objects. 


1.2 Undecidability, Grounding and Implementations 


Perhaps the most significant problem standing in the way of automating default 
logic is undecidability of the applicability of rules. Indeed, in order to apply a 
default rule, we must prove that the justifications do not lead to a contradic- 
tion with the rest of the knowledge base KB. For full first order logic this is 
undecidable. Hence, the standard approach for handling default logic has been 
creating a large ground instance KB, of the KB, and then performing decidable 
propositional reasoning on the KB,. 

Almost all the existing implementations of default logic like DeReS [5], DLV2 
[1] or CLINGO [8], with the noteworthy exception of s(CASP) [2], follow the 
same principle. More generally, the field of Answer Set Programming (ASP), see 
[10], is devoted to this approach. As an exception, the s(CASP) system [2] solves 
queries without the grounding step and is thus better suited for large domains. 
It is noteworthy that the s(;CASP) system has been used in [9] for automating 
common sense reasoning for autonomous driving with the help of default rules. 
However, s(CASP) is a logic programming system, not a universal automated 
reasoner. For example, when we add a rule bird(father(X)) :- bird(X) to 
the formulation of the above birds example in s(CASP), the search does not 
terminate, apparently due to the infinitely growing nesting of terms. 

While ASP systems are very well suited for specific kinds of problems over a 
small finite domain, grounding becomes infeasible for large first order knowledge 
bases (KB in the following), in particular when the domain is infinite and nested 
terms can be derived from the KB. The approach described in this paper accepts 
the lack of logical omniscience and performs delayed recursive checking of excep- 
tions with diminishing time limits directly on non-grounded clauses, combined 
with the taxonomy-based priorities for defaults and numerical confidences. 


2 Algorithms 


Our approach of implementing default rules in GK for first order logic is to 
delay justification checking until a first-order proof is found and then perform 
recursively deepening checks with diminishing time limits. Thus, our system first 
produces a potentially large number of different candidate proofs and then enters 
a recursive checking phase. The idea of delaying justification checking is already 
present in the original paper of R. Reiter [13], where he uses linear resolution 
and delayed checks as the main machinery of his proofs. The results produced by 
GK thus depend on the time limits and are not stable. Showing specific fixpoint 
properties of the algorithm is not in the scope of our paper. 
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A practical question for implementation is the actual representation of default 
rules and making the rules fit the first-order proof search machinery. To this 
end we introduce blocker atoms which are similar to the justification indexes of 
Reiter. 

In the following we will assume that the underlying first order reasoner uses 
the resolution method, see [3] for details. The rest of the paper assumes famil- 
iarity with the basic concepts, terminology and algorithms of the resolution 
method. 


2.1 Background: Queries and Answers 


We assume our system is presented with a question in one of two forms: (1) Is 
the statement Q true? (2) Find values V for existentially bound variables in Q 
so that Q is true. For simplicity’s sake we will assume that the statement Q is in 
the prefix form, i.e., no quantifiers occur in the scope of other logical connectives. 

In the second case, it could be that several different value vectors can be 
assigned to the variables, essentially giving different answers. We also note that 
an answer could be a disjunction, giving possible options instead of a single 
definite answer. 

A widely used machinery in resolution-based theorem provers for extracting 
values of existentially bound variables in Q is to use a special answer predicate, 
converting a question statement Q to a formula 1X(Q(X)&-answer(X)) for 
a tuple of existentially quantified variables X in Q [6]. Whenever a clause is 
derived which consists of only answer predicates, it is treated as a contradiction 
(essentially, answer) and the arguments of the answer predicate are returned as 
the values looked for. A common convention is to call such clauses answer clauses. 
We will require that the proof search does not stop whenever an answer clause 
is found, but will continue to look for new answer clauses until a predetermined 
time limit is reached. See [16] for a framework of extracting multiple answers. 

We also assume that queries take a general form (KB& A) > Q where KB isa 
commonsense knowledge base, A is an optional set of precondition statements for 
this particular question and Q is a question statement. The whole general query 
form is negated and converted to clauses, i.e., disjunctions of literals (positive or 
negative atoms). We will call the clauses stemming from the question statement 
question clauses. 


2.2 Blocker Atoms and Justification Checking 


Without loss of generality we assume that the precondition and consequent for- 
mulas a and y in default rules are clauses and justifications (1,..., Bn are lit- 
erals, i.e. positive or negative atoms: @ : (1,...8n F y. Complex formulas can 
be encoded with a new predicate over the free variables of the formula and an 
equivalence of the new atom with the formula. Recall that Reiter assumes that 
the default rules are Skolemized. 

We encode a default rule as a clause by concatenating into one clause the pre- 
condition and consequent clauses a(x) and y(x) and blocker atoms block(=61), 
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..., block(=8,,) where each justification 8; is either a positive or a negative atom. 
The negation ~ is used since we prefer to speak about blockers and not justifi- 
catons. For example, the “birds can fly” default rule is represented as a clause 


—bird(X) V fly(X) V block(0, neg(fly(X))) 


where X is a variable and neg(fly(X)) encodes the negated justification. The first 
argument of the blocker (0 above) encodes priority information covered in the 
next section. 

A proof of a question clause is a clause containing only answer atoms and 
blocker atoms. In the justification checking phase the system attempts to prove 
each decoded second blocker argument =@; in turn: the proof is considered 
invalid if some of =G; can be proved and this checking-proof itself is valid. If 
we pose a question fly(X) = answer(X) to the system to be proved (see the 
earlier example), we get two different answers: answer(p) V block(neg(fly(p)) 
and answer(b) V block(neg(fly(b)). Checking the first of these means trying to 
prove —fly(p) which succeeds, hence the first answer is invalid. Checking the 
second answer we try to prove —=fly(b) which fails, hence the answer is valid. 

Notice that the contents =(; of blockers, just like answer clauses, have a role 
of collecting substitutions during the proof search: this enables us to disregard 
the order in which the clauses are used, i.e. both top-down, bottom-up and mixed 
proof search strategies can be used. 

Importantly, blockers are used during the subsumption checks similarly to 
ordinary literals. A clause Cı with fewer or more general literals than Cə is 
hence always preferred to C2, given that (a) the literals of C; subsume C3, 
disregarding the priority arguments of blockers, and (b) the priority arguments 
of corresponding blocker literals in C4 are equal or stronger than these of C2. 
When combined with the uncertainty and inconsistency handling mechanisms of 
CONFER, the subsumption restrictions of the latter also apply. There are also 
other differences to ordinary literals. First, we prohibit the application of equality 
(demodulation or paramodulation) to the contents of blocker atoms during proof 
search. Second, we discard clauses containing mutually contradictory blockers 
(assuming the decoding of the second argument) like we would discard ordinary 
tautologies. 


2.3 Priorities, Recursion and Infinite Branches 


Default rule priorities are critical for the practical encoding of commonsense 
knowledge. The usage of priorities in proof search is simple: when checking a 
blocker with a given priority, it is not allowed to use default rules with a lower 
priority. We encode priority information as a first argument of the blocker literal, 
offering several ways to determine priority: either as an integer, a taxonomy class 
number, a string in a taxonomy or a combination of these with an integer. 

For automatically using specificity we employ taxonomy classes: a class has 
a higher prirority than those above it on the taxonomy branch. We have built a 
topologically sorted acyclic graph of English words using the WordNet taxonomy 
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along with an efficient algorithm for quick priority checks during proof search. 
Taxonomy classes are indicated with a special term like $(61598). Alternatively 
one can use an actual English word like $(“bird”) which is automatically rec- 
ognized to be more specific than, say, $(“object”). To enable more fine-grained 
priorities, an integer can be added to the term like $(“bird”,2) generating a 
lexicographic order. 

The recursive check for the non-provability of blockers can go arbitrarily deep, 
except for the time limits. Our algorithm allocates N seconds for the whole proof 
search and spends half of N for looking for different proofs and answers for the 
query, with the other half split evenly for each answer. Again, the time allocated 
for checking an answer is split evenly between the blockers in the answer. Each 
such time snippet is again split between a search for the proof of the blocker, and 
if found, for recursively checking the validity of this proof. Once the allocated 
time is below a given threshold (currently one millisecond) the proof is assumed 
to be not found. 

Answers given by the system depend on the amount of time given, the search 
strategy chosen etc. For example, consider the Nixon triangle presented earlier, 
with two contradictory default rules. In case the priorities of these rules are equal 
and we allow defaults with the same priority to be used for checking an answer 
containing a blocker, the recursive check terminates only because of a time limit, 
which is unpredictable. Hence, we may sometimes get one answer and sometimes 
another. In order to increase both stability and efficiency, GK checks the blockers 
in the search nodes above, and terminates with failure in cases nonterminating 
loops are detected. Therefore GK always gives a sceptical result to the Nixon 
triangle: neither pacifist(nixon) nor —pacifist(nixon) is proven. 


3 Confidences and Inconsistencies 


GK integrates the exception-handling algorithms described in the previous 
chapter with the algorithms designed for handling inconsistent KB-s and numeric 
confidences assigned to clauses, previously presented as a CONFER framework in 
[18]. The framework is built on the resolution method. It calculates the estimates 
for the confidences of derived clauses, using both (a) the decreasing confidence of 
a conjunction of clauses as performed by the resolution and paramodulation rule, 
and (b) the increasing confidence of a disjunction of clauses for cumulating evi- 
dence. CONFER handles inconsistent KB-s by requiring the proofs of answers to 
contain the clauses stemming from the question posed. It performs searches both 
for the question and its negation and returns the resulting confidence calculated 
as a difference of the confidences found by these two searches. 

The integrated algorithm is more complex than the one we previously 
described. Whenever the algorithms of the previous chapter speak about “prov- 
ing”, the system actually performs two independent searches — one for the pos- 
itive and one for the negated goal — with the confidences calculated for both 
of these. A blocker is considered to be proved in case the resulting confidence 
is over a pre-determined configurable threshold, by default 0.5. Blocker proofs 
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must also contain the clause built from the blocker. Thus, the whole search tree 
for a query consists of two types of interleaved layers: positive/negative confi- 
dence searches and blocker checking searches, the latter type potentially making 
the tree arbitrarily deep up to the minimal time limit threshold. 


4 Implementation and Experiments 


The described algorithms are implemented by the first author as a software 
system GK available at https: //logictools.org/gk/. GK is written in C on top of 
our implementation of the CONFER framework [18] which is built on top of a 
high-performance resolution prover GKC [17] (see https://github.com/tammet / 
gkc) for conventional first order logic. Thus GK inherits most of the capabilities 
and algorithms of GKC. 

A tutorial and a set of default logic example problems along with proofs 
from GK are also available at http://logictools.org/gk. GK is able to quickly 
solve nontrivial problems built by extending classic default logic examples. It 
is also able to solve classification problems combining exception and cumulative 
evidence and problems with dynamic situations using fluents, including planning 
problems. We have built a very large integrated knowledge base from the Quasi- 
modo [14] and ConceptNet [15] knowledge bases, converting these to default logic 
plus confidences. GK is able to solve simple problems using this large knowledge 
base along with the Wordnet taxonomy for specificity: see the referenced web 
page for examples. 

The following small example illustrates the fundamental difference of GK 
from the existing ASP systems for default logic. The standard penguins and 
birds example presented above in the ASP syntax is 


bird(b1). 

penguin(p1). 

bird(X) :- penguin(X). 

flies(X) :- bird(X), not -flies(X). 
-flies(X) :- penguin(X). 


Both GK and the ASP systems clingo 5.4.0, dlv 2.1.1 and s(CASP) 0.21.10.09 
give an expected answer to the queries flies(b1) and flies(p1). However, 
when we add the rules 


bird(father(X)) :- bird(X). 
penguin(father(X)) :- penguin(X). 


none of these ASP systems terminate for these queries, while GK does solve 
the queries as expected. Notably, as pointed out by the author of s(CASP), this 
system does terminate for the reformulation of the same problem with the two 
replacement rules 


flies(X) :- bird(X), not abs(X). 
abs(X) :- penguin(X). 
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while clingo and dlv do not terminate. When we instead add the facts and rules 


father (b1i,b2). 
father(p1,p2). 


father (bN-1,bN). 
father (pN-1,pN). 


ancestor(X,Y):- father(X,Y). 
ancestor(X,Y) :- ancestor(X,Z), ancestor(Z,Y). 


for a large N, s(CASP) does not terminate and clingo and dlv become slow for 
flies(b1): ca 8s for N = 500 and ca 1 min for N = 1000 on a laptop with a 
10-th generation i7 processor. GK solves the same question with N = 1000 under 
half a second and with N = 100000 under three seconds: the latter problem size 
is clearly out of scope of the capabilities of existing ASP systems. 

We have previously shown that the confidence handling mechanisms in CON- 
FER may slow down proof search for certain types of problems, but do not have 
a strong negative effect on very large commonsense CYC [12] problems in the 
TPTP problem collection. Differently from CONFER, the algorithms for default 
logic described above do not substantially modify the resolution method imple- 
mentation of pure first order logic search, thus the performance of these parts 
of GK are mostly the same as of GKC. The ability to give a correct answer to a 
query during a given time limit depends on the performance of these components, 
and not on the overall recursively branching algorithm. 


5 Summary and Future Work 


We have presented algorithms and an implementation of an automated reason- 
ing system for default logic on the basis of unrestricted first order logic and a 
resolution method. While there are several systems able to solve default logic or 
similar nonmonotonic logic problems, these are built on the basis of answer set 
programming and are normally based on grounding. We are not aware of other 
full first order logic reasoning systems for default logic, and neither of systems 
integrating confidences and inconsistency-handling with rules with exceptions. 

Future work is planned on three directions: adding features to the solver, 
proving several useful properties of the algorithms and incorporating the solver 
into a commonsense reasoning system able to handle nontrivial tasks posed in 
natural language. The work on incorporating similarity-based reasoning into GK 
and building a suitable semantic parser for natural language is currently ongoing. 
We are particularly interested in exploring practical ways to integrate GK with 
the machine learning techniques for natural language. 
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Abstract. To give concise explanations for a conclusion obtained by 
reasoning over ontologies, justifications have been proposed as minimal 
subsets of an ontology that entail the given conclusion. Even though 
computing one justification can be done in polynomial time for tractable 
Description Logics such as EL", computing all justifications is compli- 
cated and often challenging for real-world ontologies. In this paper, based 
on a graph representation of €£Lt-ontologies, we propose a new set of 
inference rules (called H-rules) and take advantage of them for provid- 
ing a new method of computing all justifications for a given conclusion. 
The advantage of our setting is that most of the time, it reduces the 
number of inferences (generated by H-rules) required to derive a given 
conclusion. This accelerates the enumeration of justifications relying on 
these inferences. We validate our approach by running real-world ontol- 
ogy experiments. Our graph-based approach outperforms PULi [14], the 
state-of-the-art algorithm, in most of cases. 


1 Introduction 


Ontologies provide structured representations of domain knowledge that are suit- 
able for AI reasoning. They are used in various domains, including medicine, 
biology, and finance. In the domain of ontologies, one of the interesting topics is 
to provide explanations of reasoning conclusions. To this end, justifications have 
been proposed to offer users a brief explanation for a given conclusion. Comput- 
ing justifications has been widely explored for different tasks, for instance for 
debugging ontologies [1,9,11] and computing ontology modules [6]. Extracting 
just one justification can be easy for tractable ontologies, such as ELT [17]. For 
instance, we can find one justification by deleting unnecessary axioms one by 
one. However, there may exist more than one justification for a given conclu- 
sion. Computing all such justifications is computationally complex and reveals 
itself to be a challenging problem [18]. 

There are mainly two different approaches [17] to compute all justifications 
for a given conclusion, the black-box approach and the glass-box approach. 
The black-box approach [11] relies only on a reasoner and, as such, can be 
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used for ontologies in any existing Description Logics. For example, a simple 
(naive) black-box approach would check all the subsets of the ontology using an 
existing reasoner and then filter the subset-minimal ones (i.e., justifications). 
Many advanced and optimized black-box algorithms have been proposed since 
2007 [10]. Meanwhile, the glass-box approaches have achieved better perfor- 
mances over certain specific ontology languages (such as €£*-ontology) by going 
deep into the reasoning process. Among them, the class of SAT-based methods 
[1-3, 14,16] performs the best. The main idea developed by SAT-based methods 
is to trace, in a first step, a complete set of inferences (complete set for short) 
that contribute to the derivation of a given conclusion, and then, in a second step, 
to use SAT-tools or resolution to extract all justifications from these inferences. 
A detailed example is provided in Sect. 4.1. 

In the real world, ontologies are always huge. For instance, the SnomedCT 
ontology contains more than 300,000 axioms. Thus, the traced complete set can 
be large, which could make it challenging to extract the justifications over them. 
Several techniques could be applied to reduce the size of the traced complete set, 
like the locality-based modules [8] and the goal-directed tracing algorithm [12]. 
One of their shared ideas is to identify, for a given conclusion, a particular part of 
the ontology relevant for the extraction of justifications. For example, the state- 
of-the-art algorithm, PULi [14], uses a goal-directed tracing algorithm. However, 
even for PULi, a simple ontology O = {A; E Aj4i | 1 < i < n—1} with the 
conclusion Ap E An leads to a complete set containing n — 1 inferences. This set 
can not be reduced further even with the previously mentioned optimizations. 
From this observation, we decided to explore a new SAT-based glass-box method 
to handle such situations better. 

Now, let us look carefully at the ontology O above, and let us regard each 
A; as a graph node N4,. Then we are able to construct, for O, a directed graph 
whose edges are of the form Na, > Nu,,,- It turns out that all the justifications 
for the conclusion Ag E A, are extracted from all the paths from N4, to Ng,, 
and here we have only one such path. We can easily extend this idea on €£*- 
ontology because most of the €£Lt-axioms can be interpreted as direct edges 
except one case (i.e., A = B,N---MB,,), for which we need a hyperedge (for more 
details see Definition 3). However, for more expressive ontologies, this translation 
becomes more complicated. For example, it is hard to map ALC-axioms to edges 
as those axioms may contain negation or disjunction of concepts. 

This example inspired us to explore a hypergraph representation of the ontol- 
ogy and reformulate inferences and justifications. Roughly, our inferences are 
built from elementary paths of the hypergraph and lead to particular paths 
called H-paths. Then, computing all the justifications for a given conclusion 
is made using such H-paths. For the previous ontology O and the conclusion 
Ag E An, our complete set is reduced to only two inferences (no matter the 
value of n) corresponding to the unique path from N4, to N4,. The source 
of improvement provided by our method is twofold. On the one hand, it comes 
from the fact that elementary paths are pre-computed while extracting the infer- 
ences and that existing algorithms like depth-first search can efficiently compute 
such paths. On the other hand, yet as a consequence, decreasing the size of the 
complete sets of inferences leads to smaller inputs for the SAT-based algorithm 
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extracting justifications from the complete set (recall here that our method is a 
SAT-based glass-box method). 

The paper is organized as follows. Section2 introduces preliminary defini- 
tions and notions. In Sect. 3, we associate a hypergraph representation to EL*- 
ontology and introduce a new set of inference rules, called H-rules, that generate 
our inferences. In Sect. 4, we develop the algorithm minH, which compute justifi- 
cations based on our inferences. Section 5 shows experimental results and Sect. 6 
summarizes our work. 


2 Preliminaries 


2.1 ELt-Ontology 


Given sets of atomic concepts Nc = {A,B,---} and atomic roles Nr = 
{r,s,t,---}, the set of EL"concepts C and axioms a are built by the follow- 
ing grammar rules: 


C:=T|A|CNC|ar.C, az=CEC|C=Clrls|ro---orn Ls. 


A EL*-ontology O is a finite set of ELt-axioms. An interpretation Z = 
(A?,-7) of O consists of a non-empty set A? and a mapping from atomic 
concepts A € Nc to a subset A? C A? and from roles r € Np to a sub- 
set r? C A? x A?. For a concept C built from the grammar rules, we define 
C* inductively by: (T) = AZ, (C n DY =C’ A DZ, Gacy = {a € At | 
Ab, (a,b) E€ r7,b € C*}, (ros)? = {(a,b) € AF x AT | Fe, (a,c) € r7, (c,b) € 87}. 
An interpretation is a model of © if it is compatible with all axioms in 
O, i.e., for al C E D,C = D,r E sro- or, E s € O, we have 
CZ C Dt,Ct = D?,r? C 87, (r1 0-+- orn)? C s7, respectively. We say O K a 
where qa is an axiom iff each model of O is compatible with a. A concept A is 
subsumed by B w.r.t. O if OR ACB. 

Next, we use A, B,--- ,G (possibly with subscripts) to denote atomic con- 
cepts and we use X,Y,Z (possibly with subscripts) to denote atomic concepts 
A,- ,G, or complex concepts Jr.A, ---, 4r.G. 

We assume that ontologies are normalized. A €£*-ontology O is normalized 
if all its axioms are of the form A = B,N---NB,,AC BAN- N Bm, A = 
dr.B, A C 3r.B,rE s, or ros Ct, where A, B, Bi € No, and r,s,t € Ng. Every 
EL*-ontology can be normalised in polynomial time by introducing new atomic 
concepts and atomic roles. 


Example 1. The following set of axioms is a EL* -ontology: 

O={a:A ED, ag:D E Jr.E, az:E C F,ax:B = At.F, as:r E t, ag:G = 

COB ,a7:C E A}. 
It is clear that O |] AC Jr.E as for all models T, we have At C DF by the 


axiom a, and D? C (Ar.E)* by az. 
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Table 1. Inference rules over €£t-ontology. 


R.: ACA),::-,ALCA,, ANAN- - -NACB 
ae ALB 
. ACA), A,Car.B . AC dr.By, Bı C Bo, Jr.B25 B 
es ALSr.B Maa ALB 
R Aolan.Ai, +++, An-1G3rn.An, 110---ornLr 
A AoC ar.An 


2.2 Inference, Support and Justification 


Given a EL"-ontology O, a major reasoning task over O is classification, which 
aims at finding all subsumptions O |= A E B for atomic concepts A, B occurring 
in O. Generally, it can be solved by applying inferences recursively over O [5]. 

An inference p is a pair (Ppre, Peon) Whose premise set Ppre consists of E Lt- 
axioms aut conclusion Peon is a single ELt-axiom. As usual, a sequence of 
inferences p',--- , p” is a derivation of an axiom a from O if p”,,, = a and for 
sy PE pes bere, we have 3 € O or 3 = plon for some j < i. 

As usual, inference rules are used to generate inferences. For instance, 
Table 1 [1,5] shows a set of inference rules for €£t-ontologies. Next, we use 
Ot AC B to denote that A E B is derivable from O using inferences generated 
by the rules in Table 1. The set of inference rules in Table 1 is sound and complete 
for classification [5], i.e., O HF AC BiffOF AC B for any A, BE No. 

A support of AC B over O is asub-ontology O’ C O such that O’ = AC B. 
The justifications for AE E B are subset-minimal supports of A E B. We denote 
the collection of all justifications for A E B w.r.t. O by Jo(AC B). 

We say S is a complete set (of inferences) for A E B if for any justifications 
O' of AC B, we can derive A E B from O’ using only the inferences in S. 


Example 2 (Example 1 cont’d). Before applying inference rules, axioms in 
O are preprocessed in order to be compatible with Table 1. For example, a4 
replaced by B E At.F and 3t.F E B. Then, according to the inference rules 
Table 1, we may y produce the following inferences: p= QAC D,D Car.E},A 
Jr.E), Paue Jr.E,r E t}, A CE Ht.E) and p” = {AE X.E, E | F, 3t.F 
B}, A E B) generated by rule Ro, Ra and R3 respectively: Then Ob AC 
since AL B is derivable from O by the sequence p, p', p”. 

Notice that O' = {a1,a2,a3,a4,a5} is a support for A E B, and thus, any 
superset O” of O' is a support of AT B. O' is also one of the justifications for 
AC B as for any O" C O', we have O" A E B. Moreover, here the three 
inferences p, p', p” provide a complete set for AC B. 


INS & 


2 | 


3 Hypergraph-Based Inference Rules 


3.1 H-Inferences 


In general, a (directed) hypergraph G = (V,€) is defined by a set of nodes V and 
a set of hyperedges E [4,7]. A hyperedge is of the form e = (S1, S2), S1, S2 C V. 
In this paper, a hypergraph is associated to an ontology as follows: 
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Definition 3. For a given EL*-ontology O, the associated hypergraph is Go = 
(Vo,€o) where (i) the set of nodes Vo = {Na, Nr, Nar.a | A E No,r € Ng} and 
(it) the set of edges Eo is defined by f(O) where f is the multi-valued mapping 
shown in Fig. 1. Given a hyperedge e of Eo, the inverse image of e, f—*(e), is 
defined in the obvious manner. For a set E of hyperedges, f~!(E) = Uecef ‘(e). 


a f(a) i; Np, 
1. ACBıN -Bn [({Na}, (Np, }), 1<i<n Np, 
2.)A=Bin---Br|({Na}, (No, }), [<i<m N; a 
{NB n NBm h {Na}) : 
3. AC4Ar.B {Na}, {N3r.B}) Np 
4. A=3r.B {Nar.B}, {Na} 3 
and ({Na}, {Nar.B}) °- Ny —* N3;p 
5./rls {N,}, {Ns }) and 5. 6 N 
{Naya}, {Nas.a}) for all A N,—>N, Ü. 
6./rosLt {N,, Ns}, {Ns, Ne}) Naa > Nas. N, N, 


Fig. 1. Definition of f (left) and graphical illustrations of f(a) (right) 


Notice that, the hyperedges associated with A = B,M---M Bm are (i) the 
hyperedge ({Nz,,--: , NB}, {Na4}) and (2) of course, the edges corresponding 
to AC BAN- N By. 


Example 4 (Example 1 cont’d). The hypergraph Go for O is shown in 
Fig. 2, where €o = ({Nc}, {Na}), & = ({Na}, {Np}), eg = ({Np}, {Nar.e}); 
etc. Also, f-'(e9) =C CA, f-(e1) =ALD, and f-(e2) = D E Fr.E, ete. 


1 e 
Nya Np =Ni Ng ei 
e e az Ne N, N, 
“N? NAN. 
XE yes K B ar.X an.X 
s A 
anf 4t. F for X € {A, B, C, D, G} 


Fig. 2. The hypergraph associated with the ontology O. 


As for graphs, a path (next called regular path) from nodes N; to N2 ina 
hypergraph is a sequence of edges: 


eo = (S9, 98), e1 = (S1, S4), ,€n = (ST, S3) (1) 


where N, € S9, Na € S} and Si-'=Si,1 < i < n. Next, the existence of a 
regular path from Nx to Ny in a hypergraph Go is denoted Nx ~ Ny. Now, 
we introduce hypergraph-based inferences which are based on the existence of 
regular paths as follows: 
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Table 2. H-rules over Go = (Vo, Eo). 


_ Nx~Ny , Nx Nae; Ng, ~>Nps, Nar.B2~> Ny 
Ho: ——————— He: 


NewS Ny Newby 


h h 
Nx~~>Np,,:-: ,Nx~Np,,, NawNy, e 


Hi: a 
Nx~ Ny 


: e=({Np,,- -| Ne}, {Na})€Eo 


h h 
Nx~~Nar.A,, Na ,~Nas.4o, N3t.4o~Ny, e 


Hs i h 
Nx~Ny 


: e=({Nr, Ns}, {Ns, Ni} )€Eo 


Definition 5. Given a hypergraph Go, Table 2 gives a set of inference rules 
called H-rules. Inferences based on H-rules are called H-inferences. Next, we 


denote by OF, Nx u Ny (or simply Nx na Ny ) the fact that Nx NA Ny can 
be derived from Go using the H-inferences. 


Example 6 (Example 4 cont’d). As shown in Fig. 2, we have Na ~ Nar.E, 
Ne ~ Nr, Nar ~ Np from the existence of regular paths. Then we can 


derive Na A Ne from Go by the H-rules Ho, Ho and Hoa which generate the H- 
inferences p!, p°, p’, where p' = ({Na ~ Nang}, Na Ka Narg), p? = {Ng ~ 
Nr}, Ne ~ Np) and p? = ({Na ~ Nap, Ne ®© Np, Nar ~ Ng}, Na > 
Np), respectively. 


Note that the first rule Ho, the initialization rule, makes regular paths the 
elementary components of H-rules. Moreover, Proposition 7 formally states that, 
in our H-inference system, we do not need to add the transitive inference rule: 

h h 
Nx ~ Nz, Nz ~ Ny 
Nx S Ny 


Proposition 7. IfO Fr, Nx ~» Nz and O Fn Nz ~ Ny then O Fnr Nx ~ Ny. 


3.2 Completeness and Soundness of H-Inferences 


The following result is the main result of this section. It states the equivalence 


of Nx È Ny derivation (by Table 2) and ontology entailment for X E Y, and 
thus states that our H-rules are sound and complete for €£*-ontology. 


Theorem 8. If O is an EL*-ontology, then O = X CY if O Fr Nx Ua Ny, 
where X,Y are concepts of either form A or Jr.B. 


Proof. “<=” is obvious by induction over Table2 and the fact that Nx ~ Ny 
implies O EF X C Y, so we only need to prove the direction “=>”. 
Assume that O = X C Y. We consider two cases: 
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Case 1. We assume O F X E Y?. Let d(X,Y) be the length of one shortest 
derivation of X C Y from O using Tablel. We prove “=” by induction on 
d(X,Y). 


— Assume d(X,Y) = 0. In this case O must contain axioms of the form X = 
Yn- or X CYTN.---. Clearly we have Ny ~ Ny thus OF, Nx Ne: 
— Assuming “=>” holds when d(X,Y) < k, let us prove “=” holds for 
d(X,Y) =k. Suppose p!*# is the last inference in one shortest derivation of 
X CY using Table 1. Two cases arise: 
1. Assume p's! is generated by Ri(n > 1), Rg or Ra(n = 2). For example, 
assume p% = ({X C Fr.By, Bı E Bo, 3r.B2 CE Y}, X CY) comes from 
R3. We have d(X,4r.B,), d(Bi, B2), d(ar.B2,Y) < k because their cor- 
responding subsumptions can be derived without p!@**. By the assump- 
tion O Fa Nx & Nar.B,, NB, d Np,, Nar.B3 en Ny. Then we have 


O Fp, Nx ie Nar. Bo by first deriving Nx Ese Nar.B,, NB, Be NB,, and 
then applying H-inference: 


I h h 
ee = ({Nx ~> N3r.B,, NB,~>Np,, Nar.B, ee Nar.B.},Nx ~~ Nar.B,)- 


Then O Fap Nx È Ny by Proposition 7 since O Hn Nx ~% 
N3,r.B,, Nar.B, a5 Neg. The argument also holds for Ri(n > 1)(or 
Ra(n = 2)) by applying Hı (or H3) instead of H2. 
2. Assume p'7*! is generated by Ri(n = 1), R2 or Ra(n = 1). Then, in each 
case, we have p'@* has the form ({X C Z,Z C Y},X C Y). As in 


case 1, we have d(X, Z),d(Z,Y) < k. By the assumption, O Fn Nx 2 
Nz, Nz > Ny, then O Fr Nx ~+ Ny by Proposition 7. 


Case 2. If OF XEY does not hold, then X or Y is not atomic. In this case, 
we introduce new axioms A = X, B = Y with new atomic concepts A, B and 
denote the extended ontology by O’. Clearly, O’ = AC B and thus O'FF AC B 
since Table 1 is sound and complete. Therefore, we have O’ Fa NA BN B by the 
same arguments as above. Now, notice that Go» is obtained from Go by adding 
4 edges: ({Na}, {Nx }), {Nx}, {Na}; ANB}, {Ny }) and ({Ny}, {NB}), thus 
we have O! Fn NA &Np iff O Fr Nx S Ny. 


3.3 Extracting Justifications from Go 


Now, we formally define H-paths as a hypergraph representation of classical 
derivations based on H-rules. The reader should pay attention to the fact that 
H-paths are not classical hyperpaths [7]. Next, for the sake of homogeneity, we 
consider a regular path from Nx to Ny as the set of its edges and denote it as 
P X,Y. 

1 The reader should recall that the equivalence (O = X E Y if OF XEY) only 


holds when X and Y are atomic concepts wrt. the inference system presented in 
Table 1. 
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Definition 9 (H-paths). In the hypergraph Go, an H-path Hx y from Nx to 
Ny is a set of edges recursively generated by the following composition rules: 


0. A regular path Px y is an H-path from Nx to Ny; 

1. Ife = ({Nz,,---,Nz,,},{Na}) E Vo, if Hx, B, are H-paths for i = 1.m, 
and if Pay is a regular path, then Hx B, U-:-U Hx p,, U Pay U {e} is an 
H-path from Nx to Ny; 

2. If Hx ar.B,,Hp,,B, are H-paths and P3,.p,,y is a regular path, then Hx 3r.B,U 
Hp,,B, U P3r.By,y is an H-path from Nx to Ny; 

3. Ife = ({N,, Ns}, {Ns, Ni}) € Vo, if Hx ara,,Ha,as.a, are H-paths and if 
P3t.4,,B is a regular path, then Hx ara, U Ha,,as.A, U Pat.a,,p U {e} is an 
H-path from Nx to Ny. 


Fig. 3. Structure of H-paths from Nx to Ny 


Figure 3 gives an illustration of H-paths: the blue arrows ~> correspond to 


regular paths, and the red ones B to H-paths. It is straightforward to compare 
composition rules building H-paths with H-rules building derivations in Table 2. 
One may also consider H-paths as deviation-trees with leaves corresponding to 
the edges in Go. However, our approach provides a more direct characterization 
of justifications as shown in Theorem 10. 

We say that an H-path Hx,y is minimal if there is no H-path Hy y such 
that Ak y Cc Axy. 

Now, we are ready to explain how H-paths and justifications are related. We 
can compute justifications from minimal H-paths as stated below: 


Theorem 10. Given X,Y of either form A or 3r.B. Let 


S= {f-\(Hxy) | Hx y is a minimal H-path from Nx to Ny}. 


Then Jo(X EY) = {s€ S| s £ s,Vs' € S}. That is, all justifications for 
X CY are the minimal subsets in S. 
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Proof. For any justification O’ of X E Y, there exists a minimal H-path Hx y 
such that O’ = f~'(Hx.y). The reason is that, since O' H X E Y, there 
exists an H-path Hx y from Nx to Ny on Go by Theorem 8. Without loss of 
generality, we can assume Hx y is minimal on Go, then it is also minimal on 
Go since Go is a sub-graph of Go. We have O' = f~'(Hx y) because otherwise 
there exists O” Ç O’ such that O” = f~!(Hxy), and thus O” = X E Y by 
Theorem 8 again. Therefore, O’ is not a justification. Contradiction. 

Now, we know S contains all justifications for X CL Y. Moreover, 
f-\(Axy) —& X CY for any H-path Hx y. Therefore, we have Jo(X E Y) = 
{s € S | s £s,Vs' E€ S} by the definition of justifications. 


Example 11. (Example 4 cont’d). The regular paths from Na to N3,.¢ and 
from Ng to Np produce two H-paths Haare = {e1,€2,e3} and Hp r = {e4}. 
Then, applying the third composition rule with HA arg, He, r and Par. F,B = 
{es}, we get Hap = {e1, €2,€3,€4,e6}, which is the unique H-path from Na to 
Neg. Thus, by Theorem 10, we have {a1, 2,03, 4, 05}, the only justification for 
ACB. 


4 Implementation: Computing Justifications 


4.1 SAT-Based Method 


In this section, we describe briefly how PULi [14], the state-of-the-art glass- 
boz algorithm, proceeds. Given an ontology O, computing Jo(X E Y) is done 
through 2 steps: (1) tracing a complete set for X E Y, (2) using resolution to 
extract the justifications from the complete set. The following example illustrates 
both steps: 


Example 12 (Example 1 cont’d). Let us compute Jo(G E D) using PULi’s 
method. 


1. Using the goal-directed tracing algorithm in [12], the first step produces a 
complete set of inferences? {p1, p2} for G E D, where pı = HG E C,C 
A},GC A),po=GEA,ALCD},GCD). 

2. This step is again composed of two parts: 

(a) The first part proceeds to the translation of the inferences into clauses. 
Let us denote p,:G E C, po:C E A, p:A C D, pa:G C A, ps:G EC D. 
Here the literals Pı, P2, P3 (with a bar) are called answer literals as they 
correspond to the axioms ag,a7,a1 in O. Thus, we obtain C = {7p, V 
=P, V pa, Ps V “Ps V ps} by rewriting the inferences pı, p2 as clauses. 

(b) Secondly, a new clause aps is added to C, where ps corresponds to the 
conclusion G E D, and resolution is applied over C. The set of all justi- 
fications Jo(G E D) is obtained by considering (i) the clauses formed of 


? For the sake of simplicity, we use the inference rules in Table 1 although PULi uses 
a slightly different set of inference rules [13]. 
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Algorithm 1: minH 
input : XCY 
output: J: Jo(XCY). 
1JjJ<49,; 
u— CompleteH(Nx~5Ny); 
min_hpaths ~ resolution(clauses(U/)); 
for h € min_hpaths do 
if f~'(h’) Z f~*(h) for any h' € min_hpaths then 
| J.add(f~*(h)) 
end 


o NOONAN 


end 


answer literals only and (ii) among them keeping the minimal ones’. In 
this example, after the resolution phase, the only clause that consists of 
merely answer literals is =P; V Pa V P3. Thus, the set of all justifications 
is Jo(GC D) = {{a1, a6, a7}}. 


Our method for computing justifications follows the same steps as PULi 
although here the major difference is that the first step computes a complete set 
of H-inferences instead of a complete set of inferences wrt. Table 1. 


4.2 Computing Justification by Minimal H-Paths 


In this section, given an ontology O and its associated hypergraph Go, we present 
minH (Algorithm 1) that computes all justifications for Xo E Yo using the min- 
imal H-paths from Nx, to Ny, over Go. The algorithm minH proceeds in two 
steps described below. 


Step 1. First, at Line 2, minH computes a complete set of inferences U for 
Nx, a Ny, using CompleteH (See Algorithm 2). Here, U is complete in the 
sense that for any H-path Hx y, we can derive Nx Rin Ny using inferences in U 
from the edge set Hx y. CompleteH computes U as follows: 


— Line 3—12 of Algorithm 2: The recursive application of trace_one_turn 
(See Algorithm 3) outputs the set of all H-inferences whose conclusion is the 
given input Nx, Bis Ny,; 

— Line 13-17 of Algorithm 2: Let path be the depth-first search algorithm 
that computes all regular paths from Nx to Ny in Go with input (Nx, Ny). 
Intuitively, the purpose is to shift inferences from regular paths to edges. 


Step 2. Then Algorithm minH computes all justifications for Xo E Yo as follows: 


3 Here a clause c is smaller than c if all the literals of c are in c1. 
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Algorithm 2: CompleteH 


input : Nx Ny 


output: U: a complete set of inferences for N. Rea Nes 


jas 


U, history, Q — Ú ; // Q is a queue 
2 Q.add(Nx~Ny); 

3 while Q 4 @ do 

4 Nx,>Ny, <— Q.takeNext(); 

5 | history.add(Nx,~>Ny,); 

u-ulU trace_one_turn(Nx,~*Ny,); 


6 
7 for Nx, Ny, appearing in trace_one_turn(Nx,~5Ny,) do 
8 if Nx,~+Ny, ¢ history and Nx, Ny, Z Q then 
9 | Q.add(Nx, Ny») 

10 end 

11 end 

12 end 

13 for Nx,~Ny, appearing in U do 

14 for p={e1, €2,--- , €n} Epath(Nx, Ny) do 

15 | U.add(({e1, e2,- , €n}, Nx Nyz)); 

16 end 

17 end 


— Line 3 of Algorithm 1: It computes all minimal H-paths from Nx, to Ny, 
using resolution, which is developed by PULi*, over the clauses generated 
from U as illustrated in Sect. 4.1. Here, a literal p is associated with each 


edge e, each Nx a Ny, and each Nx ~ Ny in U. The answer literals are 
those associated with edges. 

— Line 4-8 of Algorithm 1: It computes justifications by mapping back all 
the minimal H-paths and select the subset-minimal sets as stated in Theorem 
10. 


Example 13 (Example 4 cont’d). Assume Xo = G and Yo = D are the input 
of minH. Then at line 2 of minH, we have U = {p',p?}, where p = HNG ~ 
Np},Ne B Np) is H-inference obtained by CompleteH (line 3-12) and p? = 
({eo, €1,e3}, Na ~ Np) is produced from regular paths obtained by CompleteH 
(line 138-17). Let us denote Do:eo, Pı:€1, Does as answer literals and p3:Ng ~> 
Np, pa: NG 2 Np. Then clauses(U) = {7p3 V pa, =Po V AD, V `P V ps}. 

By resolution over clauses(U), we obtain min_hpaths ={{eo, €1, es}} at line 
3 of minH. Then the output of minH is J={{a1,a6,a7}}, which is the set of all 
justifications for GC D. 


4 Available at https: //github.com/liveontologies/pinpointing-experiments. 
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Algorithm 3: trace_one_turn 


om Non AONB 


m 
° 


e ee 
V Ne 


= 
A 


15 
16 


17 
18 


input : Nx Ny 


output: the set result of all H-inferences whose conclusion is N ANY 


result — 0; 
Pi(X,Y) — {({Na,,-++; NBp },{Na}) € E? | OR XCALY}; 
for ({Ne,,° i -, NBm}» {Na}) = Pi(X,Y) do 
if path(N4, Ny)490 or Y=A then 
| result.add(({Nx~3Na,,--- , Nx SNB, Naw Ny}, Nx Ny) ; 
end 
end 
P2(X,Y) — {(r, Bi, B2) | OFX CAr.Bi, B1EB2,3r.B2CY }; 
for (r, Bi, B2) E€ P2(X,Y) do 
if path(N3;.B., Ny)# Ø or Y=dr. Bz then 
| result.add(({Nx~5Nar.2,,Na,~>N By, Nar.py~>Ny}, Nx“sNy)); 
end 
end 
P3(X,Y) — {(r,s,t, A1, A2) | roskteO, OFX Car. Ai, AiCds.Ae, It. A2LY }; 


for (r,s,t, A1, A2) € P3(X,Y) do 


if path(Nat.a,, Ny)4@ or Y=at.Ao then 
result. add(Ọ Nx Nara, Na,~»Nas.Ay, Nat.aa~Ny, (Nr, Ne}, {Ns, NiP}, 
{Ns Ni})}, Nx NY); 
end 
end 


4.3 Optimization 


Below we present two optimizations that have been implemented in order to 
accelerate the computation of all justifications. 


1. 


In Algorithm 3, for the H-inference added at Line 5, we require that there 
exists at least one regular path from N4 to Ny that does not contain an edge 
ei = ({Na}, {NB }) for some 1 < i < m. Otherwise, as shown in Fig. 4, H- 
paths corresponding to this H-inference are not minimal, as they all contain 
one H-path from Nx to Ny of the form Hx,s, U (Pa,y — {e:}). In the same 
spirit, we require that the H-path from Nx to Ng, does not pass by Na. 


. If we have an H-path H4 B = HA 3r.B, U Az, Bə U P3r.B2,B where 


HA 3r.B, = Haar.c U Ao,p,- (2) 


then Ao,B, = Hc B, UHB, Bp, is also an H-path and HA, B = Af 4 ar.cUHc,B,U 
P3r.B2,B. The two different ways to decompose H,4\p above are already con- 
sidered in Line 8 when executing Algorithm 3 with the input N4 BN B. It 
means that the decomposition (2) is redundant. We can avoid such redun- 
dancy by requiring Jr.Bə # Y at Line 11. 
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Fig. 4. Illustration of Optimization 1 


5 Experiments 


To evaluate and validate our approach, we compare minH° with PULi [14], the 
state-of-the-art algorithm for computing justifications at this moment. Both 
methods compute all justifications based on resolution but with different infer- 
ence rules generated in different ways. PULi uses a complete set (next denoted 
by elk) generated by the ELK reasoner [13], which uses inference rules slightly 
different from those in Table 1. Our method uses the complete set U generated 
by Step 1 of minH, described in Sect.4.2. To analyze the performance of our 
setting, we make the following two measures: (1) we compare the size of elk with 
that of U, (2) we compare the time cost of PULi with that of minH. All the 
experiments were conducted on a machine with an INTEL Xeon 2.6 GHz and 
128 GiB of RAM. 

The experiments were processed with four different ontologies®: go-plus, 
galen7, SnomedCT (version Jan. 2015 and Jan. 2021). All the non-EL* axioms 
are deleted. Here, go-plus, galen7 are the same ontologies used in [14]. We denote 
the four ontologies above by go-plus, galen7, snt2015 and snt2021. The number of 
axioms, concepts, relations, and queries for each ontology are shown in Table3. 

Next a query refers to a direct subsumption’ A E B. In our experiments, 
for the four ontologies, the set of all justifications Jo(A E B) is computed for 
each query A E B. A query A [E B is called trivial iff all minimal H-paths from 
Na, to Np are regular paths, otherwise, the query is non-trivial. 


Comparing Complete Sets: U vs. elk. We summarize our results in Table 4 
and Fig. 5. Table 4 shows that on all four ontologies, U is much smaller than elk 
on average. Especially on galen7, the difference between elk and U is even up 
to 50 times. The gap is even more significant for the median value since a large 
part of the queries is trivial. However, the gap is much smaller for the maximal 
number. On snt2021, the largest U in size is three times larger than that of elk. 


5 A prototype is available at https://gitlab.lisn.upsaclay.fr/yang/minH. 

ê Available at https://osf.io/9sj8n/, https://www.snomed.org/. 

T ie, O K AC B and there is no other atomic concept A’ such that O =| A E 
A’, A’ E B. Direct subsumptions can be computed by a reasoner supporting ontology 
classification. 
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Table 3. Summary of sizes of the input ontologies. 


go-plus galen7 snt2015 snt2021 
#eaxioms 105557 44475 311466 362638 
#concepts 57173 28482 311480 361226 
#roles 157 964 58 132 
#queries 90443 91332 461854 566797 


Table 4. Summary of size of elk, U. 


go-plus galen7  snt2015 snt2021 


elk average 166.9 3602.0 114.7 67.3 
median 43.0 3648.0 10.0 31.0 
max 7919.0 81501.0 2357 2226 

U average 34.2 74.6 29.4 19.4 
median 4.0 5.0 1.0 3.0 
max 7772 24103 2002 6452 


#non-trivial query 50272 62470 195082 304321 


In Fig. 5, for a given query, if the complete set elk contains fewer inference 
rules than U, the corresponding blue point is below the red line. The percentage 
of such cases are: 0.34% for go-plus, 0.066% for galen7, 0.79% for snt2015, and 
1.01% for snt2021. It means that for most of the queries, the corresponding U is 
smaller than elk. 

As shown in Table 4 and in Fig. 5, sometimes minH generates bigger complete 
set U than PULi. It may happen when, for example, there might be exponen- 
tially many different regular paths occurring in the computation process of minH. 
Therefore, minH could produce a huge complete set. Also, U can be bigger than 
elk when all the regular paths involved are simple. For example, if all regular 
paths contain only one edge, then the complete set U includes many clauses 
of the form ~pe V pyy+Nz, Which happens because H-rules use regular paths. 
Indeed, the clause ~pe V py +N, iS redundant since we can omit this clause by 
replacing py,~+Nz by pe. For elk, this does not happen. 


Comparing Time Cost: minH vs. PULi. In the following, we only compare 
the time cost on non-trivial queries. For trivial queries, all H-path are regular 
paths. Thus all the justifications have already been enumerated by path in minH. 
It is also easy to compute all the justifications for trivial queries for PULi. 

We set a limit of 60s for each query. The timed-out queries contribute of 60s 
to the total time cost. To compare minH with PULi, we test all three different 
strategies, threshold, top down, bottom up of the resolution algorithm proposed 
n [14]. We summarize in Table5 the total time cost (top) and the timed-out 
queries (bottom). Figure 6 gives the comparisons over queries that are successful 
for both minH and PULi. 
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(c) snt2015 (d) snt2021 


Fig. 5. Each blue point has coordinate (log(#|/|), log(#|elk|)), where U, elk are gen- 
erated from a non-trivial query, the red line is x = y. (Color figure online) 


As shown in Table5, when using the threshold strategy, minH is more time 
consuming in total (+5%) on snt2021, and minH has more timed-out queries 
than PULi on snt2015 and snt2021. This is in part due to the fact that U is 
larger than elk for relatively many queries on snt2015 and snt2021 as shown in 
Fig. 5. For the remaining 11 cases, minH performs better than PULi in terms of 
total time cost and the number of timed-out queries. Especially on galen7, the 
gap between the two methods is even up to ten times for the total time cost. 
We can see from Table 5 that the threshold strategy performs the best for PULi 
on all four ontologies. This strategy is also the best strategy for minH except for 
galen7, for which the bottom up strategy is the best with minH. 

For each strategy detailed in Fig. 6, the black curve (the ordered time costs 
of minH on successful queries) is always below the red curve (the ordered time 
costs of PULi on successful queries) for all the ontologies. This suggests that 
minH spends less time over successful queries. Also, most of the green points are 
below the red lines, which suggests that minH performs better than PULi most of 
the time for a given query. In some cases, we can see that PULi is more efficient 
than minH. One of the reasons might be as follows. Note that when computing 
justifications by resolution, we have to compare two different clauses and delete 
the redundant one (i.e., the non-minimal one). When regular paths are big, minH 
might be time consuming because of these comparisons. 
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Fig. 6. For each line, the left, middle and right charts correspond to threshold, top 
down, bottom up strategies respectively. The y-axis is the log value of time(s). The red 
(resp. black) curve presents the ascending ordered (log value of) time cost of PULi (resp. 
minH). For a green point (x,y), e” is the time cost of minH for the query corresponding 
to the red line point (x, y’). (Color figure online) 
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Table 5. Total time cost and number of timed-out queries. 


threshold top down bottom up 
total times(s) go-plus 8482.7/7350.3 16352.3/8935.6 73629.1/17950.9 
(PULi/mink) galen7? 10796.2/3681.4 43372.9/10607.9 36300.9/3156.3 


snt2015 1956.8/973.5 13650.7/1107.6 = 15058.3/11392.2 
snt2021 2116.1/2222.6 11573.9/2361.6 19402.1/17154.9 
timed-out queries go-plus 116/103 /93 202/117/114 935/223 /223 


(PULi/minH/both) galen7 48/43/43 370/123/120 228/38/38 
snt2015 0/3/0 49/3/3 96/88/83 
snt2021 2/8/1 39/9/9 144/133/128 


6 Conclusion 


In this paper, we introduce and investigate a new set of sound and complete 
inference rules based on a hypergraph representation of ontologies. We design the 
algorithm minH that leverages these inference rules to compute all justifications 
for a given conclusion. The key of the performance of our method is that regular 
paths are used as elementary components of H-paths and this leads to reducing 
the size of complete sets because (1) rules are more compact than standard 
ones, (2) redundant inferences are captured and eliminated by regular paths 
(see Sect. 4.3). The efficiency of the algorithm minH has been validated by our 
experiments showing that it outperforms PULi in most of the cases. 

There are still many possible extensions and applications of the hypergraph 
approach. For instance, to get even more compact inference rules, we could 
extend the notion of regular path to a more general one that will encapsulate the 
inference rule Ho in the same way as regular paths are encapsulated in H-rules. 
Moreover, we will try to apply our approach for other tasks like classification 
and to compute logical differences [15]. 
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Abstract. Choice logics constitute a family of propositional logics and 
are used for the representation of preferences, with especially qualita- 
tive choice logic (QCL) being an established formalism with numerous 
applications in artificial intelligence. While computational properties and 
applications of choice logics have been studied in the literature, only few 
results are known about the proof-theoretic aspects of their use. We pro- 
pose a sound and complete sequent calculus for preferred model entail- 
ment in QCL, where a formula F is entailed by a QCL-theory T if F 
is true in all preferred models of T. The calculus is based on labeled 
sequent and refutation calculi, and can be easily adapted for different 
purposes. For instance, using the calculus as a cornerstone, calculi for 
other choice logics such as conjunctive choice logic (CCL) can be obtained 
in a straightforward way. 


1 Introduction 


Choice logics are propositional logics for the representation of alternative options 
for problem solutions [4]. These logics add new connectives to classical propo- 
sitional logic that allow for the formalization of ranked options. A prominent 
example is qualitative choice logic (QCL for short) [7], which adds the con- 
nective ordered disjunction X to classical propositional logic. Intuitively, AXB 
means that if possible A, but if A is not possible than at least B. The semantics 
of a choice logic induce a preference ordering over the models of a formula. 

As choice logics are well suited for preference handling, they have a multitude 
of applications in AI such as logic programming [8], alert correlation [3], or 
database querying [13]. But while computational properties and applications of 
choice logics have been studied in the literature, only few results are known 
about the proof-theoretic aspects of their use. In particular, there is no proof 
system capable of deriving valid sentences containing choice operators. In this 
paper we propose a sound and complete calculus for preferred model entailment 
in QCL that can easily be generalized to other choice logics. 

Entailment in choice logics is non-monotonic: conclusions that have been 
drawn might not be derivable in light of new information. It is therefore not 
surprising that choice logics are related to other non-monotonic formalisms. For 
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instance, it is known [7] that QCL can capture propositional circumscription 
and that, if additional symbols in the language are admitted, circumscription 
can be used to generate models corresponding to the inclusion-preferred QCL 
models up to the additional atoms. We do not intend to use this translation of 
our choice logic formulas (or sequents) in order to employ an existing calculus 
for circumscription, for instance [5]. 

Instead, we define calculi in sequent format directly for choice logics, which 
are different from existing non-monotonic logics in the way non-monotonicity 
is introduced. Specifically, the non-standard part of our logics is a new logi- 
cal connective which is fully embedded in the logical language. For this reason, 
calculi for choice logics also differ from most other calculi for non-monotonic 
logics: our calculi do not use non-standard inference rules as in default logic, 
modal operators expressing consistency or belief as in autoepistemic logic, or 
predicates whose extensions are minimized as in circumscription. However, one 
method that can also be applied to choice logics is the use of a refutation calculus 
(also known as rejection or antisequent calculus) axiomatising invalid formulas, 
i.e., non-theorems. Refutation calculi for non-monotonic logics were used in [5]. 
Specifically, by combining a refutation calculus with an appropriate sequent cal- 
culus, elegant proof systems for the central non-monotonic formalisms of default 
logic [16], autoepistemic logic [15], and circumscription [14] were obtained. How- 
ever, to apply this idea to choice logics, we have to take another facet of their 
semantics into account. 

With choice logics, we are working in a setting similar to many-valued log- 
ics. Interpretations ascribe a natural number called satisfaction degree to choice 
logic formulas. Preferred models of a formula are then those models with the 
least degree. There are several kinds of sequent calculus systems for many-valued 
logics, where the representation as a hypersequent calculus [1,10] plays a promi- 
nent role. However, there are crucial differences between choice logics and many- 
valued logics in the usual sense. Firstly, choice logic interpretations are classical, 
i.e., they set propositional variables to either true or false. Secondly, non-classical 
satisfaction degrees only arise when choice connectives, e.g. ordered disjunction 
in QCL, occur in a formula. Thirdly, when applying a choice connective o to two 
formulas A and B, the degree of Ao B does not only depend on the degrees of 
A and B, but also on the maximum degrees that A and B can possibly assume. 
Therefore, techniques used in proof systems for conventional many-valued logics 
can not be applied directly to choice logics. 

In [11] a sequent calculus based system for reasoning with contrary-to-duty 
obligations was introduced, where a non-classical connective was defined to cap- 
ture the notion of reparational obligation, which is in force only when a violation 
of a norm occurs. This is related to the ordered disjunction in QCL, however, 
based on the intended use in [11] the system was defined only for the occurrence 
of the new connective on the right side of the sequent sign. We aim for a proof 
system for reasoning with choice logic operators, and to deduce formulas from 
choice logic formulas. Thus, we need a calculus with left and right inference rules. 
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To obtain such a calculus we combine the idea of a refutation calculus with 
methods developed for multi-valued logics in a novel way. First, we develop a 
(monotonic) sequent calculus for reasoning about satisfaction degrees using a 
labeled calculus, a method developed for (finite) many-valued logics [2,9,12]. 
Secondly, we define a labeled refutation calculus for reasoning about invalidity 
in terms of satisfaction degrees. Finally, we join both calculi to obtain a sequent 
calculus for the non-monotonic entailment of QCL. To this end, we introduce a 
new, non-monotonic inference rule that has sequents of the two labeled calculi 
as premises and formalizes degree minimization. 

The rest of this paper is organized as follows. In the next section we present 
the basic notions of choice logics and introduce the most prominent choice logics 
QCL and CCL (conjunctive choice logic). In Sect.3 we develop a labeled sequent 
calculus for propositional logic extended by the QCL connective X. This calculus 
is shown to be sound and complete and already can be used to derive interesting 
sentences containing choice operators. In Sect. 4 we extend the previously defined 
sequent calculus with an appropriate refutation calculus and non-monotonic rea- 
soning, to capture entailment in QCL. The developed methodology for QCL can 
be extended to other choice logics as well. In particular we show in Sect. 5 how 
the calculi can be adapted for CCL. 


2 Choice Logics 


First, we formally define the notion of choice logics in accordance with the choice 
logic framework of [4] before giving concrete examples in the form of QCL and 
CCL. Finally, we define preferred model entailment. 


Definition 1. Let U denote the alphabet of propositional variables. The set of 
choice connectives Cre of a choice logic L is a finite set of symbols such that 
CeN{a,A,V} =. The set Fe of formulas of L is defined inductively as follows: 
(i)a € Fe for alla € U; (ii) if F € Fc, then (nF) € Fr; (iti) if F,G € Fe, 
then (F o G) € Fg foro € ({A, V}UCz). 


For example, Cocu = {X} and ((axc) A (bXc)) € Focu. Formulas that do not 
contain a choice connective are referred to as classical formulas. 

The semantics of a choice logic is given by two functions, satisfaction degree 
and optionality. The satisfaction degree of a formula given an interpretation 
is either a natural number or oo. The lower this degree, the more preferable 
the interpretation. The optionality of a formula describes the maximum finite 
satisfaction degree that this formula can be ascribed, and is used to penalize 
non-satisfaction. 


Definition 2. The optionality of a choice connective o € Cg in a choice logic L 
is given by a function opt?: N? +N such that opt?(k,0) < (k+1)-(€+1) for 
all k,€ € N. The optionality of an L-formula is given via optr: Fe — N with 
(i) opt (a) = 1 for every a E€ U; (ii) optp(AF) = 1; (iii) opt-(F AG) = opt. (FV 
G) = max(opt,(F), optc(G)); (iv) opt g(F o G) = opt2.(opt g(F), opt e(G)) for 
every choice connective o € Cr. 
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The optionality of a classical formula is always 1. Note that, for any choice 
connective o, the optionality of F o G is bounded such that opt;(Fo G) < 
(opt -(F’) + 1) - (opte (G) + 1). In the following, we write N for (NU {oo}). 


Definition 3. The satisfaction degree of a choice connective o E€ Cg in a choice 
logic L is given by a function deg% : N? x N = N such that deg? (k, £, m,n) < 
optù(k,£) or deg% (k,£,m,n) = œ for all k,£ € N and all m,n € N. The satis- 
faction degree of an L-formula under an interpretation T C U is given via the 
function degs: 24 x Fe +N with 


. dege(Z,a) = 1 ifa € T, dege(Z,a) = œ otherwise for every a € U; 

. deg (T, AF) = 1 if degeli. F) = œ, deg c (T, =F) = œ otherwise; 

. deg (T, F A G) = maz(degc (T, F), deg a. G)); 

. dege(Z, FV G) = min(deg c (T, F), degr(Z,G)); 

. deg; (T, Fo G) = deg? (opt; (F), opt- (G), deg -(Z, F), dege(Z,G)), 0 € Ce. 


aniv wow 


We also write Z =£ F for degs (T, F) = m. If m < œ, we say that T satisfies F 
(to a finite degree), and if m = oo, then F does not satisfy F. If F is a classical 
formula, then Z Ef F TF and THE F TAF. The symbols T 
and L are shorthand for the formulas (a V 7a) and (a ^ 7a), where a can be any 
variable. We have opte (T) = opte (L) = 1, deg (T, T) = 1 and degs (T, L) = œ 
for any interpretation Z in every choice logic. 

Models and preferred models of formulas are defined in the following way: 


Definition 4. Let L be a choice logic, T an interpretation, and F an L- 
formula. T is a model of F, written as T € Modc(F), if dege(Z,F) < œœ. 
T is a preferred model of F, written as T € Prf-(F), if T € Modc(F) and 
degs (T, F) < degc (I, F) for all other interpretations J. 


Moreover, we define the notion of classical counterparts for choice connectives. 


Definition 5. Let L be a choice logic. The classical counterpart of a choice 
connective o € Cg is the classical binary connective ® such that, for all atoms 
a and b, deg-(Z,a0b) <œ <= = T }Ha®b. The classical counterpart of an 
L-formula F is denoted as cp(F) and is obtained by replacing all occurrences of 
choice connectives in F by their classical counterparts. 


A natural property of known choice logics is that choice connectives can be 
replaced by their classical counterpart without affecting satisfiability, meaning 
that deg (T, F) < co <— = TE cp(F) holds for all £-formulas F. 

So far we introduced choice logics in a quite abstract way. We now introduce 
two particular instantiations, namely QCL, the first and most prominent choice 
logic in the literature, and CCL, which introduces a connective G called ordered 
conjunction in place of QCL’s ordered disjunction. 
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Definition 6. QCL is the choice logic such that Cocu = {X}, and, if k = 
optocL(F), L= optocL (G), m = degac,(Z, F), and n = degag,(Z,G), then 


optocu (FXG) 5 opt o(k, £) =k+£, and 


m if m < œ; 
degacr(Z, FXG) = deggcr(k,l,m,n) = n+k ifm =oco,n < 00; 
co otherwise. 


In the above definition, we can see how optionality is used to penalize non- 
satisfaction: given a QCL-formula F XG and an interpretation Z, if Z satis- 
fies F (to some finite degree), then degoor (T, FXG) = degocL(Z, F); if T 
does not satisfy F, then degoor (Z, FXG) = optocL(F) + degoc (Z, G). Since 
degocL (Z, F) < optocL(F), interpretations that satisfy F result in a lower 
degree, i.e., are more preferable, compared to interpretations that do not sat- 
isfy F. Let us take a look at a concrete example: 


Example 1. Consider the QCL-formula F = (aXc) A (bXc). Note that the clas- 
sical counterpart of X is V, i.e., cp(F) = (av c) A (bV c). Thus, {c}, {a, b}, {a,c}, 
{b,c}, {a,b,c} € ModocL(F). Of these models, {a,b} and {a,b,c} satisfy F to 
a degree of 1 while {c}, {a,c}, and {b,c} satisfy F to a degree of 2. Therefore, 
{a,b}, {a,b,c} € Prf oco (F). 


Next, we define CCL. Note that we follow the revised definition of CCL [4], which 
differs from the initial specification’. Intuitively, given a CCL-formula F OG it 
is best to satisfy both F and G, but also acceptable to satisfy only F. 


Definition 7. CCL is the choice logic such that Coo, = {@}, and, if k = 
optac, (F), L= optoa, (GZ), m = degoa, (Z, F), and n = degacg, (Z, G), then 


n ifm=1,n < oc; 
degcoL (T, FOG) = 4 m+£ ifm < oœ and (m > 1 orn = o0); 
co otherwise. 


Example 2. Consider the CCL-formula G = (ac) A (bSc). Note that the clas- 
sical counterpart of © is the first projection, i.e., cp(G) = a Ab. Thus, {a,b}, 
{a,b,c} € Modcci(G). Of these models, {a,b,c} satisfies G to a degree of 1 
while {a,b} satisfies G to a degree of 2. Therefore, {a,b,c} € Prf acz(G). 


If £ is a choice logic, then a set of £-formulas is called an £-theory. An 
£-theory T entails a classical formula F, written as T þ F, if F is true in 
all preferred models of T. However, we first need to define what the preferred 
models of a choice logic theory are. There are several approaches for this. In the 
original QCL paper [7], a lexicographic and an inclusion-based approach were 
introduced. 

1 Tt seems that, under the initial definition of CCL, ab is always ascribed a degree 
of 1 or œ, i.e., non-classical degrees can not be obtained (cf. Definition 8 in [6]). 
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Definition 8. Let L be a choice logic, Z an interpretation, and T an L-theory. 
T € Modg(T) if deg (ZL, F) < œ for all F € T. TE(T) denotes the set of formulas 
in T satisfied to a degree of k by T, i.e., TE(T) ={F €T | deg, (Z, F) = k}. 


- T is a lexicographically preferred model of T, written as T € Prf (T ), if 
T € Mod (T J and if there is no J E€ Mode (T) such that, for some k € N and 
all 1< k, |ZE(T)| < |FE(T)| and |E}(T)| = |IL(T)| holds. 

- T is an inclusion-based preferred model of T, written as T € Prf?" (T), if 
T € Modg(T) and if there is no J € Mode (T ) such that, for some k E€ N and 
alll < k, TE(T) C JE(T) and LAT) = GT) holds. 


In our calculus for preferred model entailment we focus on the lexicographic 
approach, but it will become clear how it can be adapted to other preferred model 
semantics (see Sect. 4). We now formally define preferred model entailment: 


Definition 9. Let L be a choice logic, T an L-theory, S a classical theory, and 
o € {lex, inc}. T Z S if for all T € PrfZ(T) there is F € S such that T E F. 


Example 3. Consider the QCL-theory T = {=(aAb),aXc, bXc}. Then {c}, {a, c}, 
{b,c} € = Modoc (T ). Note that, because of ~(a ^b), a model of T can not satisfy 
both aXc and bXc to a degree of 1. Specifically, 


{a, c}hor(T) = {~a A b), aXc} and {a, c}&cL(T) = {bxc}, 
{b, choy (T) = {=a A b),bXc} and {b, cor (T) = {aXe}, 
{ckoor(T ) = {7(a A b)} and {c}Oor(T )= {aXc, bxc}. 


Thus, {a,c}, {b,c} € Prf ELT ) but {c} g Prf Eè (T ). It can be concluded that 
T Gat c^ (aV b). However, T K Göra and T & GGz0. 


It is easy to see that preferred model entailment is non-monotonic. For example, 
{axb} /Gér a but {axb, sa} K Séra. 


3 The Sequent Calculus L[QCL] 


As a first step towards a calculus for preferred model entailment, we propose a 
labeled calculus [2,12] for reasoning about the satisfaction degrees of QCL formu- 
las in sequent format and prove its soundness and completeness. One advantage 
of the sequent calculus format is having symmetrical left and right rules for all 
connectives, in particular for the choice connectives. This is in contrast to the 
representation of ordered disjunction in the calculus for deontic logic [11], in 
which only right-hand side rules are considered. 

As the calculus will be concerned with satisfaction degrees rather than pre- 
ferred models, we need to define entailment in terms of satisfaction degrees. To 
this end, the formulas occurring in the sequents of our calculus are labeled with 
natural numbers, i.e., they are of the form (A), where A is a choice logic formula 
and k E€ N. (A), is satisfied by those interpretations that satisfy A to a degree of 
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k. Instead of labeling formulas with degree co we use the negated formula, i.e., 
instead of (A). we use (=A). We observe that (A), for opt-(A) > k can never 
have a model. We will deal with such formulas by replacing them with (L). For 
classical formulas, we may write A for (A)ı. 


Definition 10. Let (Ai)z,,---;(Am)x,, and (Bi)i,,---,(Bn)i,, be labeled QCL- 
formulas. (A1)k,,---;(Am)k,, F (Bi)iy,---;(Bn)i, is a labeled QCL-sequent. 
[TEA is valid iff every interpretation that satisfies all labeled formulas in T 
to the degree specified by the label also satisfies at least one labeled formula in A 
to the degree specified by the label. 


Note that entailment in terms of satisfaction degrees, as defined above, is mon- 
tonic. Frequently we will write (A)<, as shorthand for (A)i,...,(A),—1 and 
(A)>x for (A)k+1;-- <, (A) optge, (A) (7A)1- Moreover, (I (A); F Aics denotes 
the sequence of sequents 


PAS A... T, Meat A. 


Analogously, (T, (A); F A)is, stands for the sequence of sequents I’, (A)k+1 F 
A... T, (A)optoaa F A T, CA) F A. 

We define the sequent calculus L[QCL] over labeled sequents below. In addi- 
tion to introducing inference rules for X we have to modify the inference rules 
for conjunction and disjunction of propositional LK. The idea behind the V-left 
rule is that a model M of (A), is only a model of (A V B)ķ if there is no l < k 
s.t. M is a model of (B);. Therefore, every model of (A V B)ẹ is a model of A iff 


— every model of (A); is a model of A or of some (B); with | < k, 
— every model of (B), is a model of A or of some (A); with | < k. 


Essentially the same idea works for A-left but with l > k. For the V-right rule, 
in order for every model of I’ to be a model of (AV B)x, every model of I’ must 
either be a model of (A), or of (B), and no model of I’ can be a model of (A); 
for l < k, i.e., P(A), F L. Similarly for A-right. 


Definition 11 (L[QCL]). The axioms of L[QCL] are of the form (p)ı F (p)ı for 
propositional variables p. The inference rules are given below. For the structural 
and logical rules, whenever a labeled formula (F’), appears in the conclusion of 
an inference rule it holds that k < opt; (F). 


The structural rules are: 
TEA TEA T, (A)r, (A) F A TF (Ak, (A)k, A 


wl wr 


T, (A) F A TF (A), A Tapa Y TrA A 


The logical rules are: 


IF (cp(A))1, A 4 T, (cp(A)ı F A a 


T(AiF A T GALA 


DARE BA DEBRE MAA y (AiE Ahir (DBE Aier TEA (BA 
T, (AV B] A M, TF (AVB), A a 
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Ar 


T, (A)x H (B)>k,A  T,(B)s F (A)>k, A of (AjiF A)isn (T, (B)]iF Sior TF (Akr, (B)k, A 
T, (AAB), F â A TF (AAB), A 


The rules for ordered disjunction, with k < opt (A) and l < opt;(B), are: 


T, (A) F A 2 T, (B), CA) F A zı 
> xt > X le 
T, (AXB) FA ~ T,(ARB)optgg (ast A 
TEHA A = TH (>A), A Pr(B),A > 
-o aa L AA = XT: 
TH(AXB),,A *"! T F (AXByopigg,(A)4ts Â 2 


The degree overflow rules?, with k € N, are: 


rTALFA dol TEA dör 
0 
Aj+k F A DE (A)optocr(4)+k: A 


T.A) 


optact( 


Observe that the modified A and V inference rules correspond to the A and V 
inference rules of propositional LK in case we are dealing only with classical 
formulas. Our /-left rule splits the proof-tree unnecessarily for classical theories, 
and the A-right rule adds an unnecessary third condition + A,B, A. These 
additional conditions are necessary when dealing with non-classical formulas. 

The intuition behind the degree overflow rules is that we sometimes need to 
fix invalid sequences, i.e., sequences in which a formula F is assigned a label k 
with optocL(F) < k < œ. 


Example 4. The following is an L[QCL]-proof of a valid sequent.’ 


bVc,7-a,bFaNbaAc,b > aVb,=b,cFa^b,a^c, b > 


= xX la > Xx be 
bvc, (axb)2Fa^b,a^c,b a Vb, (bxc)2Fa^b,a^c,b 
> => or > > or 
(axb)2 F 7(bxc),aAb,aAc,b (bXxc)2 F a(axb),aAb,aAc,b 


((aXb) A (bXc))2 F a A b,a Ac, b 
«(aA b), ((a Xb) A (bXc))2 F a ^c, b 


Example 5. The following proof shows how the Ar-rule can introduce more than 
three premises. Note that we make use of the dol-rule in the leftmost branch. 


atki a a,b,c, =b H zI a,bF bve I a,bFa,b igs 
— g > x > 1 => x 
a,b, (a)2 F a,b, =maF a,b, (bxce)2 H : a,b, =(bxc) F a,bF a,(bxe)1 a 


a,b (a A (bXc))1 


Ar 


We now show soundness and completeness of L[QCL]. 


? dol/dor stands for degree overflow left/right. 

3 Note that, once we reach sequents containing only classical formulas, we do not 
continue the proof. However, it can be verified that the classical sequents on the left 
and right branch are provable in this case. Moreover, given a formula (A) with a 
label of 1, the label is often omitted for readability. 
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Proposition 1. L[QCL] is sound. 


Proof. We show for all rules that they are sound. 


Proposition 2. L[QCL] is complete. 


Proof. We show this by induction over the (aggregated) formula complexity of 


For (ax) and the structural rules this is clearly the case. 
(=r) and (~): follows from the fact that deggo,(Z, F) < œ => T E cp(F) 
for all QCL-formulas F. 

(vl): Assume that the conclusion of the rule is not valid, i.e., there is a model 
M of I and (AV B),; that is not a model of A. Then, M satisfies either A or 
B to degree k and neither to a degree smaller than k. Assume M satisfies A 
to a degree of k, the other case is symmetric. Then M is a model of I” and 
(A), but, by assumption, neither of A nor of (B); for any j < k. Hence at 
least one of the premises is not valid. Analogously for (Al). 

(Vr): Assume there is a model M of I that is not a model of A or of (AV B)x. 
There are two possible cases why M is not a model of (AV B);: (1) M satisfies 
neither A nor B to degree k. But then the premise I’ + (A) ,,(B),, A is not 
valid as M is also not a model of A by assumption. (2) M satisfies either A or 
B to a degree smaller than k. Assume that M satisfies A to degree j < k (the 
other case is symmetric). Then the premise T, (A); F A is not valid. Indeed, 
M is a model of I and (A); but not of A. Analogously for (Ar). 

(Xl) and (Xr): follows from the fact that (A), has the same models as 
(AXB), for k < opt;(A). 

(Xl): Assume the conclusion of the rule is not valid and let M be the model 
witnessing this. Then M is a model of (AXB) optoo,(A)+l- By definition, M 
satisfies B to degree l and is not a model of A. However, then it is also a 
model of I’, (B); and (~A), which means that the premise is not valid. 
(Xrz). Assume that both premises are valid, i.e., every model of I’ is either a 
model of A or of (=A); and (B); with l < opt (B). Now, by definition, any 
model that is not a model of A (and hence a model of (—A),) and of (B); 
satisfies AX B to degree optgo (A) +1. Therefore, every model of I’ is either 
a model of A or of (AXB) optoor(A)+b which means that the conclusion of 
the rule is valid. 

(dol): T, L has no models, i.e., the premise T, L + A is valid. Crucially, the 
sequent I’, (A) optgoi(A)+k has no models as well since A cannot be satisfied 
to a degree m with opt;(A) < m < oo. (dor) is clearly sound. 


the non-classical formulas. 


For the base case, we observed that if all formulas are classical and labeled 
with 1, then all our rules reduce to the classical sequent calculus, which is 
known to be complete. Moreover, we observe that (A)ı is equivalent to A. 
Hence, we can turn labeled atoms into classical atoms. 

Assume that a sequent of the form I, (A) optoci(4)+k H A with k € N is valid. 
Since I’, | has no models, I’, | + A is valid and, by the induction hypothesis, 
provable. Thus, I, (A) optgor(A) +k F A is provable using the (dol) rule. 
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— Assume that a sequent I’ (A) optacr(A) +h A is valid. (A) optac.(A)+k Can not 
be satisfied, i.e., 7 A is valid and, by the induction hypothesis, provable. 
Therefore, I H (A) optoor(4)+k> A is provable using the (dor) rule. 

— Assume that a sequent of the form I’ F (~A), A is valid. Then every model 
of T is either a model of (=A), or of A. In other words, every model of I" that 
is not a model of (~A); (i.e., is model of cp(A)) is a model of A. Therefore, 
every interpretation that is a model of both I’ and cp(A) must be a model 
of A. It follows that I’, cp(A) F A is valid and, by the induction hypothesis, 
provable. Thus, I’ H (=A), A is provable using the (=r) rule. Similarly for 
T, (AA), F A. 

— Assume that a sequent of the form T, (AV B)ẹ Ais valid, with k < opt (AV 
B). We claim that then both T, (A)x F (B)e,, A and T, (B) F (A)<k, A are 
valid. Assume to the contrary that T, (A) F (B)<x, A is not valid (the other 
case is symmetric). Then, there is a model M of I’ and (A), that is neither 
a model of (B)<, nor of A. But then M is also a model of I and (A V B)k, 
but not of A, which contradicts the assumption that T, (AV B), F A is valid. 
Therefore, both T, (A) F (B)ex, A and T, (B) F (A)ex, A are valid and, 
by the induction hypothesis, provable. This means that I,(A V B), F A is 
provable by (Vl). Similarly for a sequent of the form T, (A ^ B), F A. 

— Assume that a sequent of the form I + (AV B),,A is valid, with k < 
opt (AV B). We claim that then for all i < k the sequents T, (A); F A and 
T,(B); + A and F F (A)k,(B)k, A are valid. Assume by contradiction that 
there is ani < k s.t. T, (A); F A is not valid. Then, there is a model M of 
I and (A); that is not a model of A. However, then M is a model of I’ but 
neither of A nor of (A V B) (as M satisfies A V B to degree i 4 k), which 
contradicts our assumption that [+ (AV B),, A is valid. The case that there 
is an į < k s.t. T, (B), F Ais not valid is symmetric. Finally, we assume that 
I H (A)k,(B)k, A is not valid. Then, there is a model M of I that is not a 
model of (A)x, (B), or A. Then, M is model of I but neither of A nor of 
(AV B)x, contradicting our assumption. Therefore, all sequents listed above 
must be valid, and, by the induction hypothesis, + (AV B),, A is provable. 
Similarly for a sequent of the form IF (AA B)k, A. 

— Assume that a sequent of the form T, (AXB), + A with k < optacy(A) is 
valid. Then I’, (A), H A is also valid since (AX B), and (A), have the same 
models if k < optocL(4). By the induction hypothesis T, (AXB), A is 
provable. Analogously for sequents of the form T H (AXB)x, A. 

— Assume that a sequent of the form I, (AXB) optgor(A)+l H A is valid, with 
l < opt;(B). We claim that the sequent I, (B), =A F A is then also valid. 
Indeed, if M is a model of I’, (B); and ~A, then it is also a model of I and 
(AX B) optgor(A)+t- Hence, by assumption, M must be a model of A. From 
this, we can conclude as before that I, (AX B)optocr (A)+t A- A is provable. 


— Assume that a sequent of the form I H (AX B)optoor(4)+b A is valid, with 
l < opt;(B). We claim that then also the sequents [ F ~A, A and F H 
(B)ı, A are valid. Assume by contradiction that the first sequent is not valid. 
This means that there is a model M of I that is not a model of either ~A 
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nor of A. However, then M is a model of A and therefore satisfies AX B 
to a degree smaller than optocL(A). This contradicts our assumption that 
Th (AX B)optocr(4)+i A is valid. Assume now that the second sequent is 
not valid, i.e., that there is a model M of I that is neither a model of (B); 
nor of A. Then, M cannot be a model of (AX B)optocr (A)+t and we again 
have a contradiction to our assumption. As before, it follows by the induction 
hypothesis that I + (AXB) optocL(A)+b A is provable. 


So far we have not introduced a cut rule, and as we have shown our calculus 
is complete without such a rule. However, it is easy to see that we have cut- 
admissibility, i.e., L[QCL] can be extended by: 


DPE (A)x, A I’, (A), F A’ 


cut 
T, T'HA, A 


Another aspect of our calculus that should be mentioned is that, although 
L[QCL] is cut-free, we do not have the subformula property. This is especially 
obvious when looking at the rules for negation, where we use the classical coun- 
terpart cp(A) of QCL-formulas. For example, ~(axb) in the conclusion of the 
—-left rule becomes cp(aXb) = a V b in the premise. 

While we believe that L[QCL] is interesting in its own right, the question 
of how we can use it to obtain a calculus for preferred model entailment arises. 
Essentially, we have to add a rule that allows us to go from standard to pre- 
ferred model inferences. As a first approach we consider theories PU{A} with T 
consisting only of classical formulas and A being a QCL-formula. In this simple 
case, preferred models of I'U {A} are those models of TU {A} that satisfy A to 
the smallest possible degree. One might add the following rule to L[QCL]: 


(T, (Aji Leck T, (Ak FA 
T, A baer A 


Mnaive 


Intuitively, the above rule states that, if there are no interpretations that sat- 
isfy I’ while also satisfying A to a degree lower than k, and if A follows from 
all models of T, (A)x, then A is entailed by the preferred models of I U {A}. 
However, the obtained calculus L[QCL] + Wnaive derives invalid sequents. 


Example 6. The invalid entailment 7a, axb GEL a can be derived via /naive- 


sa, (aXb)1 H a 


= 
aa,axb óL a 


What is missing is an assertion that T,(A)ẹ is satisfiable. Unfortunately, this 
can not be formulated in L[QCL]. A way of addressing this problem is to define 
a refutation calculus, as has been done for other non-monotonic logics [5]. 
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4 Calculus for Preferred Model Entailment 


We now introduce a calculus for preferred model entailment. However, as argued 
above, we first need to introduce the refutation calculus L[QCL]~. In the liter- 
ature, a rejection method for first-order logic with equality was first introduced 
in [17] and proved complete w.r.t. finite model theory. Our refutation calculus 
is based on a simpler rejection method for propositional logic defined in [5]. 
Using the refutation calculus, we prove that (A), is satisfiable by deriving the 
antisequent (A), ¥ L. 


Definition 12. A labeled QCL-antisequent is denoted by T ¥ A and it is valid 
if and only if the corresponding labeled QCL-sequent + A is not valid, i.e., if 
at least one model that satisfies all formulas in I’ to the degree specified by the 
label satisfies no formula in A to the degree specified by the label. 


Below we give a definition of the refutation calculus L[QCL]~. Note that most 
rules coincide with their counterparts in L[QCL]. Binary rules are translated 
into two rules; one inference rule per premise. (Vr) and (Al) in L[QCL] have an 
unbounded number of premises, but due to their structure they can be translated 
into three inference rules. For (Ar) we need to introduce two extra rules for the 
case that either A or B is not satisfied. 


Definition 13 (L[QCL]~). The axioms of L[QCL]~ are of the form I ¥ A, 
where I’ and A are disjoint sets of atoms and L ¢ I’. The inference rules of 
L[QCL]~ are given below. Whenever a labeled formula (F’), appears in the con- 
clusion of an inference rule it holds that k < opt; (F). 


The logical rules are: 


T, (cp(A)i ¥ A a T Y (cp(A))i, A pa 

TEGA A ” TCAA ` 
T, (A)r ¥ (By cr, A T, (B)k Ý (A)<k, A 
e sh yy SSI TRESS Ay 
TUvB KA V" TUvVB KA V? 

T, (A): KA T, (B): KA DE (A)r, (Bn, A 
ei ee see ak i OMT is 
TRAV B A TRAV B A *” TF (AVB A > 

where i < k. 
T, (A) 4 (B)>k, A T, (B) ¥ (A)>k, A 
T AnB rA M T(AAB, rA Me 

DAKA pa KAKA ya RBA), 
TY(AAB), A 7! TK(AAB, d TY(AAB), A? 

P,(-B), KA T (A)r (B)x, A 
TY(AAB, A M TAABI A N 

where i > k. 


The rules for ordered disjunction, with k < opt (A) and l < opt; (B), are: 


T, (A) 4A = 


aN y] 
T, (ABA >" 


T, (B), CA) KA 2i 
> 2 
D(AXB) optgo, (Ati Ý A 


TE (CA), A 5 


DY (A), A 
(Anas 3} ms 
ry (AX B) optgc,(A)th A 


PE (B),A . 
TFAA a a 
TK (AB), A : 


=> K xrs 
T ¥ (AXB)optge, (Ato A 


T2 
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The degree overflow rules, with k € N, are: 


NLA TKA 
- ¥ dol ¥ dor 
T, (A)optocn(4)+k Ý A T Y (A)optocu(a)+k: A 


Example 7. The following is related to Example 4 and shows that the sequent 
4(a A b),((aXb) A (bXc))2 is satisfiable. 


(aV b),c,7bKaAb,L = 
(a V b), (bXc)2Ża ^b, L 
(bXc)2 K 7(aXb),a Ab, L 

((aXb) A (bXc))2 KaAb, L 
—(a A b), ((aXb) A (bX c))2 K L 


Note that the interpretation {a,c} witnesses (a V b),c, =b ¥ a Ab, L. 
Proposition 3. L[QCL]~ is sound. 


Proof. The soundness of the negation rules is straightforward. The soundness of 
the rules (XJ,), (Xlz) and (Xr) follows by the same argument as for L[QCL]. 
For the remaining rules, it is easy to check that the same model witnessing the 
validity of the premise also witnesses the validity of the conclusion. 


Proposition 4. L[QCL]~ is complete. 


Proof. We show completeness by an induction over the (aggregated) formula 
complexity. Assume I’ ¥ A is valid, i.e. + A is not valid. Now, there must be a 
rule in L[QCL] for which I F A is the conclusion. By the soundness of L[QCL], 
this implies that at least one of the premises [* | A* is not valid. However, then 
I* ¥ A* is valid and, by induction, also provable. Now, by the construction of 
L[QCL]~, there is a rule that allows us to derive  ¥ A from I* ¥ A*. 


So far no cut-rule has been introduced for L[QCL]7, and indeed, a counterpart of 
the cut rule would not be sound. One possibility is to introduce a contrapositive 
of cut as described by Bonatti and Olivetti [5]. Again, it is easy to see that this 
rule is admissible in our calculus: 

PKA T (AkHA 


TF (A), A cut2 


We are now ready to combine L[QCL] and L[QCL]~ by defining an inference rule 
that allows us to go from labeled sequents to non-monotonic inferences. Again, 
we first consider the case where I is classical and A is a choice logic formula. 
The preferred model inference rule is: 


(T, (Aji F Lick T, (A) Kk L T, (A) F A 
T, A MiL A 


jw simple 
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Intuitively, the premises (I,(A); F Lẹi<k along with T,(A)k ¥ L ensure that 
models satisfying A to a degree of k are preferred, while the premise T, (A) F A 
ensures that A is entailed by those preferred models. 


Example 8. The valid entailment =(a^b), (aXb)A(bXc) GEL alc, b is provable 
by choosing k = 2: 


(v1) (p2) (23) 
T, ((aXb) A (bXc))y H L T, ((aXb) A (bXc))g KL T, ((aXb) A (bXc))g F A 
I~ simple 
T, (aXb) A (bX c) meL A a 


with I = ~(a ^b) and A= adc, b. p3 is the L[QCL]-proof from Example 4 and 
(2 is the L[QCL]~-proof from Example 7. pı is not shown explicitly, but it can 
be verified that the corresponding sequent is provable. 


We extend Meanie to the more general case, where more than one non-classical 
formula may be present, to obtain a calculus for preferred model entailment. An 
additional rule /unsaz is needed in case a theory is classically unsatisfiable. 


Definition 14 (L[QCL] |"). Let <, be the order on vectors in N* defined by 


- v <, w if there is some n E€ N such that v has more entries of value n and 

for alll <m <n both vectors have the same number of entries of value m. 
- v =; w if, for alln € N, v and w have the same number of entries of value n. 
L[QCL] | consists of the axioms and rules of L[QCL] and L[QCL]~ plus the 
following rules, where v,w € N*, T consists of only classical formulas, and 
every A; with 1 <i<k is a QCL-formula: 


(D, (Ay)wyres > (Ak)wp F Low cv Fs (Ar)oy>- +> (Apdo, L (T, (Ar)wys- +: (Ak)wgp F Aw =v 
1 k i k 1 k | 
Mier 
Ti Bigs bs Ak Go A 
T, cp(Ay),-.., ep(Ag) F L 
i l~unsat 
P Ajres Ar 8dr 4 


We first provide a small example and then show soundness and completeness. 


Example 9. Consider the valid entailment —(a A b), (axb), (bXc) Ge. a ^c,b 
similar to Example 8, but with the information that we require (axb) and (bxc) 
encoded as separate formulas. It is not possible to satisfy all formulas on the left 
to a degree of 1. Rather, it is optimal to either satisfy (~(a A b))1, (aXb)4, (bXc)2 
or, alternatively, (~(a A b))1, (aXb)2, (bXc)1. We choose v = (1,1,2), with w = 
(1,1,1) being the only vector w s.t. w < v. Thus, we get 


T, (aXb)1, Xc) HL T,(aXb)1, Xc) L T,(aXb)1,bXc)a K A T,(aXb)2, (bXc) HA 


lex 


T, (aX b), (bX c) mieL A 


with [ = ~(a Ab) and A = a ^ c,b. It can be verified that indeed all branches 
are provable, but we do not show this explicitly here. 
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Proposition 5. L[QCL] | is sound. 


Proof. Consider first the }-je,-rule and assume that all premises are derivable. 
By the soundness of L[QCL] and L[QCL]~ they are also valid. From the first 
set of premises (I, (A1)w,,---,(Ak)w, F L)w<v we can conclude that if there is 
some model M of I’ that satisfies A; to a degree of v; for all 1 < i < k, then 
Me Prf ÉLT U {Ai,...,Ax}). The premise T, (A1)y,,.--,;(Ax)v, * L ensures 
that there is such a model M. By the last set of premises (I, (A1)w,,---;(Ak)w, F 
A) w=», we can conclude that all models of I U {A,,...,A,} that are equally 
as preferred as M, i.e., all M’ € Prf Eè l U{A1,...,Ax}), satisfy at least one 
formula in A. Therefore, I’, Ay,..., Ax GOL A is valid. 

Now consider the Wunsar-rule and assume that I, cp(A,),..., cp(Az) F L is 
derivable and therefore valid. Thus, PU{Ai,..., Ak} has no models and therefore 
also no preferred models. Then T, Ay,..., Ak Gen A is valid. 


Proposition 6. L[QCL]! is complete. 
Proof. Assume that I’, Ay,..., Ak Son A is valid. If PU{A1,..., Ax} is unsat- 
isfiable then T, cp(A1),..., cp(Ax) F L is valid, i.e., we can apply the Wunsat- 
rule. Now consider the case that [U{A1,..., Ax} is satisfiable and assume that 
some preferred model M of I’ U {Aj,..., Ax} satisfies A; to a degree of v; for 
all 1 <i < k. Then, we claim that all premises of the rule are valid and, by the 
completeness of L[QCL] and L[QCL]~, also derivable. 

Assume by contradiction that one of the premises is not valid. First, consider 
the case that I',(A1)w,,---;(Az)w, F L is not valid for some w < w. Then there 
is a model M’ of I that satisfies A; to a degree of w; for all 1 < i < k. However, 
this contradicts the assumption that M is a preferred model of PU{Aj,..., Ax}. 

Next, assume that I, (Ai)v,,---,(Axz)v, Ý L is not valid. However, M satisfies 
T,(Ai)u,,---;(Ax)y, and does not satisfy L. Contradiction. 

Finally, we assume that I,(Ai)w,,---;(Az)w, F A is not valid for some 
w = v. Then, there is a model M’ of I’ that satisfies A; to a degree of w; for all 
1 <i < k but does not satisfy any formula in A. But M” is a preferred model of 
TU{A,..., Ax}, which contradicts T, Ai,..., Ak Gen A being valid. 


In this paper, we focused on the lexicographic semantics for preferred models of 
choice logic theories. However, rules for other semantics, e.g. a rule Wine for the 
inclusion based approach (cf. Definition 8), can be obtained by simply adapting 
the way in which vectors over N! are compared (cf. Definition 14). 


5 Beyond QCL 


QCL was the first choice logic to be described [7], and applications concerned 
with QCL and ordered disjunction have been discussed in the literature [3,8,13]. 
For this reason, the main focus in this paper lies with QCL. However, as we 
have seen in Sect.2, CCL and its ordered conjunction show that interesting 
logics similar to QCL exist. We will now demonstrate that L[QCL] can easily be 
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adapted for other choice logics. In particular, we introduce L[CCL] in which the 
rules of L[{QCL] for the classical connectives can be retained. All that is needed is 
to replace the x-rules by appropriate rules for the choice connective G of CCL. 


Definition 15 (L[CCL]). L[CCL] is L[QCL], except that the X-rules are 
replaced by the following @-rules: 


F(A (B) FA o F(A CB) FA >, L (Amt A 2; 
ss © > Oly > Ola 
T,(A@B) A ~“? T(AGB)oige, (Batt A ~? T, (ABB)optoo (Byam A 

TH (A,A  TE(B), A , PTH(Ap A TEHOBhA 3 LH (A)m, A > 


@re 


Ors 


Dt (AGB), A T F (AGB) optce, (B) +t A 


where k < optec,(B), l < optee,(A), and 1 < m < optec,(A). 


TS (AČ B)optcor(B)+m> A 


Note that, given I, (AGB) optooL(B)+m H A with 1 < m < optoc,(A), we need 
to guess whether Glz or Gla has to be applied. We do not define L[CCL]~ here, 
but the necessary rules for © can be inferred from the @-rules of L[CCL] in a 
similar way to how L[QCL]~ was derived from L[QCL]. 


Proposition 7. L[CCL] is sound. 
Proof. We consider the newly introduced rules. 


~ For Ol, Glz, and Gls this follows directly from the definition of CCL. 

— (@r,). Assume both premises are valid, i.e., every model of I is a model of A 
or of (A); and (B), with k < opt;(B). By definition, any model that satisfies 
(A), and (B), satisfies AGB to degree k. Thus, every model of I is a model 
of A or of (AGB), which means the conclusion of the rule is valid. 

~ (Srz). Assume both premises are valid, i.e., every model of I is either a 
model of A or of (A); and (~B), with l < optgcy,(A). By definition, any 
model that satisfies (A); and does not satisfy B (and hence satisfies (—B),) 
satisfies AGB to degree optacy,(B) +. 

— (Gr3). Assume that the premise is valid, i.e., every model of I is either 
a model of A or of (A)m with 1 < m < optccu(A). By definition, any 
model that satisfies (A), regardless of what degree this model ascribes to 
B, satisfies AGB to degree optoc(B) +m. 


Proposition 8. L[CCL] is complete. 


Proof. We adapt the induction of the proof of Proposition 2: 


— Assume that a sequent of the form T, (AGB), + A is valid, with k < opt,(B). 
All models that satisfy (AGB). must satisfy A to a degree of 1 and B toa 
degree of k. Thus, T, (A)1, (B) F Ais valid, and, by the induction hypothesis, 
I, (A@B), + A is provable. Similarly for the cases T, (A®B) opto, (2) 41 A 
with 1 < optgo,(A), and P,(ASB)opigg,(B)+m F A with 1 < m < 
optocr (A). 
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- Assume that a sequent of the form I + (AGB),, Ais valid, with k < opt;(B). 
We claim that then I F (A), A and T F (B) x, A are valid. Assume, for the 
sake of a contradiction, that the first sequent is not valid. This means that 
there is a model M of I that is neither a model of (A); nor of A. However, 
then M satisfies AGB to a degree higher than optgg,,(B). This contradicts 
the assumption that [+ (AGB) k, A is valid. Assume now that the second 
sequent is not valid, i.e., that there is a model M of I’ that is neither a model 
of (B), nor of A. Then M cannot be a model of (AGB),, contradicting 
the assumption. As before, it follows by the induction hypothesis that I + 
(A®B),, A is provable. Similarly for the cases P+ (A@B) optgo,(B)4t A with 
l < optgo,(A), and P+ (AGB)optooL(B)+m, A with 1 < m < optoc (4). 


We are confident that our methods can be adapted not only for QCL and CCL, 
but for numerous other instantiations of the choice logic framework defined in 
Sect. 2. We mention here lexicographic choice logic (LCL) [4], in which ASB 
expresses that it is best to satisfy A and B, second best to satisfy only A, third 
best to satisfy only B, and unacceptable to satisfy neither. 

Moreover, note that the inference rules jez and /unsat (cf. Definition 14) do 
not depend on any specific choice logic. Thus, once labeled calculi are developed 
for a choice logic, a calculus for preferred model entailment follows immediately. 


6 Conclusion 


In this paper we introduce a sound and complete sequent calculus for preferred 
model entailment in QCL. This non-monotonic calculus is built on two calculi: 
a monotonic labeled sequent calculus and a corresponding refutation calculus. 

Our systems are modular and can easily be adapted: on the one hand, calculi 
for choice logics other than QCL can be obtained by introducing suitable rules for 
the choice connectives of the new logic, as exemplified with our calculus for CCL; 
on the other hand, a non-monotonic calculus for preferred model semantics other 
than the lexicographic semantics can be obtained by adapting the inference rule 
Wier Which transitions from preferred model entailment to the labeled calculi. 

Our work contributes to the line of research on non-monotonic sequent calculi 
that make use of refutation systems [5]. Our system is the first proof calculus 
for choice logics, which have been studied mainly from the viewpoint of their 
computational properties [4] and their potential applications [3,8,13] so far. 

Regarding future work, we aim to investigate the proof complexity of our 
calculi, and how this complexity might depend on which choice logic or preferred 
model semantics is considered. Also, calculi for other choice logics such as LCL 
could be explicitly defined, as was done with CCL in Sect. 5. 
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Abstract. Lash is a higher-order automated theorem prover created as 
a fork of the theorem prover Satallax. The basic underlying calculus of 
Satallax is a ground tableau calculus whose rules only use shallow infor- 
mation about the terms and formulas taking part in the rule. Lash uses 
new, efficient C representations of vital structures and operations. Most 
importantly, Lash uses a C representation of (normal) terms with per- 
fect sharing along with a C implementation of normalizing substitutions. 
We describe the ways in which Lash differs from Satallax and the perfor- 
mance improvement of Lash over Satallax when used with analogous flag 
settings. With a 10s timeout Lash outperforms Satallax on a collection 
THO problems from the TPTP. We conclude with ideas for continuing 
the development of Lash. 


Keywords: Higher-order logic - Automated reasoning - TPTP 


1 Introduction 


Satallax [4,7] is an automated theorem prover for higher-order logic that was a 
top competitor in the THF division of CASC [10] for most of the 2010s. The basic 
calculus of Satallax is a complete ground tableau calculus [2,5,6]. In recent years 
the top systems of the THF division of CASC are primarily based on resolution 
and superposition [3,8,11]. At the moment it is an open question whether there 
is a research and development path via which a tableau based prover could again 
become competitive. As a first step towards answering this question we have cre- 
ated a fork of Satallax, called Lash, focused on giving efficient C implementations 
of data structures and operations needed for search in the basic calculus. 
Satallax was partly competitive due to (optional) additions that went beyond 
the basic calculus. Three of the most successful additions were the use of higher- 
order pattern clauses during search, the use of higher-order unification as a 
heuristic to suggest instantiations at function types and the use of the first- 
order theorem prover E as a backend to try to prove the first-order part of the 
current state is already unsatisfiable. Satallax includes flags that can be used to 
activate or deactivate such additions so that search only uses the basic calculus. 
They are deactivated by default. Satallax has three representations of terms in 
Ocaml. The basic calculus rules use the primary representation. Higher-order 
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unification and pattern clauses make use of a representation that includes a 
case for metavariables to be instantiated. Communication with E uses a third 
representation restricted to first-order terms and formulas. When only the basic 
calculus is used, only the primary representation is needed. 

Assuming only the basic calculus is used only limited information about 
(normal) terms is needed during the search. Typically we only need to know the 
outer structure of the principal formulas of each rule, and so the full term does 
not need to be traversed. In some cases Satallax either implicitly or explicitly 
traverses the term. The implicit cases are when a rule needs to know if two 
terms are equal. In Satallax, Ocaml’s equality is used to test for equality of 
terms, implicitly relying on a recursion over the term. The explicit cases are 
quantifier rules that instantiate with either a term or a fresh constant. In the 
former case we may also need to normalize the result after instantiating with a 
term. 

In order to give an optimized implementation of the basic calculus we have 
created a new theorem prover, Lash!, by forking a recent version of Satallax 
(Satallax 3.4), the last version that won the THF division of CASC (in 2019). 
Generally speaking, we have removed all the additional code that goes beyond 
the basic calculus. In particular we do not need terms with metavariables since we 
support neither pattern clauses nor higher-order unification in Lash. Likewise we 
do not need a special representation for first-order terms and formulas since Lash 
does not communicate with E. We have added efficient C implementations of 
(normal) terms with perfect sharing. Additionally we have added new efficient C 
implementations of priority queues and the association of formulas with integers 
(to communicate with MiniSat). To measure the speedup given by the new parts 
of the implementation we have run Satallax 3.4 using flag settings that only 
use the basic calculus and Lash 1.0 using the same flag settings. We have also 
compared Lash to Satallax 3.4 using Satallax’s default strategy with a timeout of 
10s, and have found that Lash 1.0 outperforms Satallax with this short timeout 
even when Satallax is using the optional additions (including calling E). We 
describe the changes and present a number of examples for which the changes 
lead to a significant speedup. 


2 Preliminaries 


We will presume a familiarity with simple type theory and only give a quick 
description to make our use of notation clear, largely following [6]. We assume a 
set of base types, one of which is the type o of propositions (also called booleans), 
and the rest we refer to as sorts. We use a, @ to range over sorts and 0,7 to range 
over types. The only types other than base types are function types øT, which 
can be thought of as the type of functions from ø to T. 

All terms have a unique type and are inductively defined as (typed) variables, 
(typed) constants, well-typed applications (t s) and A-abstractions (Az.t). We 


1 Lash 1.0 along with accompanying material is available at http://grid01.ciire.cvut. 
cz/~chad/ijcar2022lash/. 
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also include the logical constant as a term of type o, terms (of type o) of the 
form (s => t) (implications) and (Vz.t) (universal quantifiers) where s,t have 
type o and terms (of type o) of the form (s =, t) where s,t have a common 
type ø. We also include choice constants €s of (o0)o at each type o. We write 
at for t > L and (s Æo t) for (s =, t > L). We omit type parentheses and type 
annotations except where they are needed for clarity. Terms of type o are also 
called propositions. We also use T, V,A, 4 with the understanding that these are 
notations for equivalent propositions in the set of terms above. 

We assume terms are equal if they are the same up to a-conversion of bound 
variables (using de Bruijn indices in the implementation). We write [s] for the 
@Bn-normal form of s. 

The tableau calculi of [6] (without choice) and [2] (with choice) define when 
a branch is refutable. A branch is a finite set of normal propositions. We let A 
range over branches and write A, s for the branch AU {s}. We will not give a full 
calculus, but will instead discuss a few of the rules with surprising properties. 
Before doing so we emphasize rules that are not in the calculus. There is no 
cut rule stating that if A, s and A, ~s are refutable, then A is refutable. (During 
search such a rule would require synthesizing the cut formula s.) There is also no 
rule stating that if the branch A, (s = t), [ps], [pt] is refutable, then A, (s = t), [ps] 
is refutable (where s,t have type o and p is a term of type go). That is, there is 
no rule for rewriting into arbitrarily deep positions using equations. 

All the tableau rules only need to examine the outer structure to test if they 
apply (when searching backwards for a refutation). When applying the rule, 
new formulas are constructed and added to the branch (or potentially multiple 
branches, each a subgoal to be refuted). An example is the confrontation rule, 
the only rule involving positive equations. The confrontation rule states that if 
S =a t and u Æa v are on a branch A (where a is a sort), then we can refute 
A by refuting Ajs 4 u,t Au and A,s Æ v,t Æ v. A similar rule is the mating 
rule, which states that if ps;...s, and —pt,...t, are on a branch A (where 
p is a constant of type o,---a,0), then we can refute A by refuting each of 
the branches A, s; 4 ti for each i € {1,...,n}. The mating rule demonstrates 
how disequations can appear on a branch even if the original branch to refute 
contained no reference to equality at all. One way a branch can be closed is if 
s Æ s is on the branch. In an implementation, this means an equality check is 
done for s and t whenever a disequation s Æ t is added to the branch. In Satallax 
this requires Ocaml to traverse the terms. In Lash this only requires comparing 
the unique integer ids the implementation assigns to the terms. 

The disequations generated on a branch play an important role. Terms (of 
sort a) occuring on one side of a disequation on a branch are called discrimi- 
nating terms. The rule for instantiating a quantified formula Vz.t (where x has 
sort a) is restricted to instantiating with discriminating terms (or a default term 
if no terms of sort a are discriminating). During search in Satallax this means 
there is a finite set of permitted instantiations (at sort œ) and this set grows as 
disequations are produced. Note that, unlike most automated theorem provers, 
the instantiations do not arise from unification. In Satallax (and Lash) when 


Lash 1.0 (System Description) 353 


Va.t is being processed it is instantiated with all previously processed instanti- 
ations. When a new instantiation is produced, previously processed universally 
quantified propositions are instantiated with it. When Vz.t is instantiated with 
s, then [(Az.t)s] is added to the branch. Such an instantiation is the important 
case where the new formula involves term traversals: both for substitution and 
normalization. In Satallax the substitution and normalization require multiple 
term traversals. In Lash we have used normalizing substitutions and memorized 
previous computations, minimizing the number of term traversals. The need 
to instantiate arises when processing either a universally quantified proposition 
(giving a new quantifier to instantiate) or a disequation at a sort (giving new 
discriminating terms). 

We discuss a small example both Satallax and Lash can easily prove. We 
briefly describe what both do in order to give the flavor of the procedure and 
(hopefully) prevent readers from assuming the provers behave too similarly from 
readers based on other calculi (e.g., resolution). 

Example SEV24175 from TPTP v7.5.0 [9] (X5201A from TPs [1]) contains a 
minor amount of features going beyond first-order logic. The statement to prove 
is 

VaU rAW «>VS(S=UVS=W) => Sz. 


Here U and W are constants of type ao, x is a variable of type a and S is a 
variable of type ao. The higher-order aspects of this problem are the quantifier 
for S (though this could be circumvented by making S a constant like U and W) 
and the equations between predicates (though these could be circumvented by 
replacing S = U by Vy.Sy = Uy and replacing S = W similarly). The tableau 
rules effectively do both during search. 

Satallax never clausifies. The formula above is negated and assumed. We will 
informally describe tableau rules as splitting the problem into subgoals, though 
this is technically mediated through MiniSat (where the set of MiniSat clauses 
is unsatisfiable when all branches are closed). Tableau rules are applied until 
the problem involves a constant c (for x), a constant 9” for S and assumptions 
Uc We S = UV S' = W and aSc on the branch. The disjunction is 
internally S 4 U = S = W and the implication rule splits the problem into 
two branches, one with S’ = U and one with S’ = W. Both branches are solved 
in analogous ways and we only describe the S’ = U branch. Since S’ = U is an 
equation at function type, the relevant rule adds Vy.S’y = Uy to the branch. 
Since there are no disequations on the branch, there is no instantiation available 
for Vy.S’y = Uy. In such a case, a default instantiation is created and used. That 
is, a default constant d (of sort a) is generated and we instantiate with this d, 
giving S’d =, Ud. The rule for equations at type o splits into two subgoals: one 
branch with $’d and Ud and another with —S’d and =Ud. On the first branch 
we mate S’d with 4S’c adding the disequation d ¥ c to the branch. This makes c 
available as an instantiation for Vy.S’y = Uy. After instantiating with c the rest 
of the subcase is straightforward. In the other subgoal we mate U c with =Ud 
giving the disequation c £ d. Again, c becomes available as an instantiation and 
the rest of the subcase is straightforward. 
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3 Terms with Perfect Sharing 


Lash represents normal terms as C structures, with a unique integer id assigned 
to each term. The structure contains a tag indicating which kind of term is 
represented, a number that is used to either indicate the de Bruijn index (for 
a variable), the name (for a constant), or the type (for a A-abstraction, a uni- 
versal quantifier, a choice operator, or an equation). Two pointers (optionally) 
point to relevant subterms in each case. In addition the structure maintains 
the information of which de Bruijn indices are free in the term (with de Bruijn 
indices limited to a maximum of 255). Knowing the free de Bruijn indices of 
terms makes recognizing potential 7-redexes possible without traversing the à- 
abstraction. Likewise it is possible to determine when shifting and substitution 
of de Bruijn indices would not affect a term, avoiding the need to traverse the 
term. 

In Ocaml only the unique integer id is directly revealed and this is sufficient 
to test for equality of terms. Hash tables are used to uniquely assign types 
to integers and strings (for names) to integers and these integers are used to 
interface with the C code. Various functions are used in the Ocaml-C interface to 
request the construction of (normal) terms. For example, given the two Ocaml 
integer ids 7 and j corresponding to terms s and t, the function mk norm_ap 
given i and j will return an integer k corresponding to the normal term [s t]. 
The C implementation recognizes if s is a \-abstraction and performs all 8n- 
reductions to obtain a normal term. Additionally, the C implementation treats 
terms as graphs with perfect sharing, and additionally caches previous operations 
(including substitutions and de Bruijn shifting) to prevent recomputation. 

In addition to the low-level C term reimplementation, we have also provided a 
number of other low-level functionalities replacing the slower parts of the Ocaml 
code. This includes low-level priority queues, as well as C code used to associate 
the integers representing normal propositions with integers that are used to 
communicate with MiniSat. The MiniSat integers are nonzero and satisfy the 
property that minus on integers corresponds to negation of propositions. 


4 Results and Examples 


The first mode in the default schedule for Satallax 3.1 is MODE213. This mode 
activates one feature that goes beyond the basic calculus: pattern clauses. Addi- 
tionally the mode sets a flag that tries to split the initial goal into several indepen- 
dent subgoals before beginning the search proper. Through experimentation we 
have found that setting a flag (common to both Satallax and Lash) to essentially 
prevent MiniSat from searching (i.e., only using MiniSat to recognize contradic- 
tions that are evident without search) often improves the performance. We have 
created a modified mode MODE213D that deactivates these additions (and delays 
the use of MiniSat) so that Satallax and Lash will have a similar (and often the 
same) search space. (Sometimes the search spaces differ due to differences in the 
way Satallax and Lash enumerate instantiations for function types, an issue we 
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Table 1. Lash vs. Satallax on 2053 THO Problems. 


Prover Problems Solved 
Lash 1501 (73%) 
Satallax (with E) 1487 (72%) 
Satallax (without E) 1445 (70%) 
Satallax (Lash Schedule) 1412 (69%) 


will not focus on here.) We have also run Lash with many variants of Satallax 
modes with similar modifications. From such test runs we have created a 10s 
schedule consisting of 5 modes. 

To give a general comparison of Satallax and Lash we have run both on 2053 
THO problems from a recent release of the TPTP [9] (7.5.0). We initially selected 
all problems with TPTP status of Theorem or Unsatisfiable (so they should be 
provable in principle) without polymorphism (or similar extensions of THO). We 
additionally removed a few problems that could not be parsed by Satallax 3.4 
and removed a few hundred problems big enough to activate SINE in Satallax 
3.4. 

We ran Lash for 10s with its default schedule over this problem set. For 
comparison, we have run Satallax 3.4 for 10s in three different ways: using the 
Lash schedule (since the flag settings make sense for both systems) and using 
Satallax 3.4’s default schedule both with and without access to E [12]. The 
results are reported in Table 1. It is already promising that Lash has the ability 
to slightly outperform Satallax even when Satallax is allowed to call E. 

To get a clearer view of the improvement we discuss a few specific examples. 

TPTP problem NUM63871 (part of Theorem 3 from the AUTOMATH formal- 
ization of Landau’s book) is about the natural numbers (starting from 1). The 
problem assumes a successor function s is injective and that every number other 
than 1 has a predecessor. An abstract notion of existence is used by having 
a constant some of type (4o)o about which no extra assumptions are made, 
so the assumption is formally Vz.2 # 1 => some(Au.z = su). For a fixed n, 
n # 1 is assumed and the conjecture to prove is the negation of the implication 
(Vzy.n = sxt => n = sy > z = y) (some(Au.n = su)). The implication is 
assumed and the search must rule out the negation of the antecedent (i.e., that 
n has two predecessors) and the succedent (that n has no predecessor). Satallax 
and Lash both take 3911 steps to prove this example. With MODE213D, Lash 
completes the search in 0.4s while Satallax requires almost 29s. 

TPTP problem SEV10875 (SIX_THEOREM from TPs [1]) corresponds to prov- 
ing the Ramsey number R(3,3) is at most 6. The problem assumes there is a 
symmetric binary relation R (the edge relation of a graph with the sort as ver- 
tices) and there are (at least) 6 distinct elements. The conclusion is that there 
are either 3 distinct elements all of which are R-related or 3 distinct elements 
none of which are R-related. Satallax and Lash can solve the problem in 14129 
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steps with mode MODE213D. Satallax proves the theorem in 0.153s while Lash 
proves the theorem in the same number of steps but in 0.046 s. 

The difference is more impressive if we consider the modified problem of 
proving R(3, 4) is at most 9. That is, we assume there are (at least) 9 distinct 
elements and modify the second disjunct of the conclusion to be that there are 
4 distinct elements none of which are R-related. Satallax and Lash both use 
186127 steps to find the proof. For Satallax this takes 44s while for Lash this 
takes 5.5s. 

The TPTP problem SY0506%1 is about an if-then-else operator. The problem 
has a constant c of type ov. Instead of giving axioms indicating c behaves as 
an if-then-else operator, the conjecture is given as a disjunction: 


(Yzy.c (x =y) zy = y)VAWVey.c Tay = 2)VA(Veyc Luy = y). 


After negating the conjecture and applying the first few tableau rules the branch 
will contain the propositions Vay.c T xy = x, Vay.cl «xy = y and the 
disequation c (d = e) d e # e for fresh d and e of type v. In principle the rules 
for if-then-else given in [2] could be used to solve the problem without using 
the universally quantified formulas (other than to justify that c is an if-then- 
else operator). However, these are not implemented in Satallax or Lash. Instead 
search proceeds as usual via the basic underlying procedure. Both Satallax and 
Lash can prove the example using modes MODEOC1 in 32704 steps. Satallax 
performs the search in 9.8s while Lash completes the search in 0.2s. 

In addition to the examples considered above, we have constructed a family of 
examples intended to demonstrate the power of the shared term representation 
and caching of operations. Let cons have type ww and nil have type v. For each 
natural number n, consider the proposition C” given by 


n (Ax.cons x x) (cons nil nil) = cons (7% (Ax.cons x x) nil) (% (Az.cons x x) nil) 


where 7 is the appropriately typed Church numeral. Proving the proposition 
C” does not require any search and merely requires the prover to normalize 
the conjecture and note the two sides have the same normal form. However, this 
normal form on both sides will be a complete binary tree of depth n+ 1. We have 
run Lash and Satallax on C” with n € {20, 21,22, 23,24} using mode MODE213D. 
Lash solves all five problems in the same amount of time, less than 0.02s for 
each. Satallax takes 4s, 8s, 16s, 32s and 64s. As expected, since Satallax is 
not using a shared representation, the computation time exponentially increases 
with respect to n. 


5 Conclusion and Future Work 


We have used Lash as a vehicle to demonstrate that giving a more efficient imple- 
mentation of the underlying tableau calculus of Satallax can lead to significant 
performance improvements. An obvious possible extension of Lash would be to 
implement pattern clauses, higher-order unification and the ability to call E. 
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While we may do this, our current plans are to focus on directions that further 
diverge from the development path followed by Satallax. 

Interesting theoretical work would be to modify the underlying calculus 
(while maintaining completeness). For example the rules of the calculus might 
be able to be further restricted based on orderings of ground terms. On the other 
hand, new rules might be added to support a variety of constants with special 
properties. This was already done for constants that satisfy axioms indicating 
the constant is a choice, description or if-then-else operator [2]. Suppose a con- 
stant r of type ito is known to be reflexive due to a formula Vz.r x x being 
on the branch. One could avoid ever instantiating this universally quantified 
formula by simply including a tableau rule that extends a branch with s Æ t 
whenever =r s t is on the branch. Similar rules could operationalize other spe- 
cial cases of universally quantified formulas, e.g., formulas giving symmetry or 
transitivity of a relation. A modification of the usual completeness proof would 
be required to prove completeness of the calculus with these additional rules 
(and with the restriction disallowing instantiating the corresponding universally 
quantified formulas). 

Finally the C representation of terms could be extended to include precom- 
puted special features. Just as the current implementation knows which de Brui- 
jus are free in the term (without traversing the term), a future implementation 
could know other features of the term without requiring traversal. Such features 
could be used to guide the search. 
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Abstract. We describe Goéland, an automated theorem prover for first- 
order logic that relies on a concurrent search procedure to find tableau 
proofs, with concurrent processes corresponding to individual branches of 
the tableau. Since branch closure may require instantiating free variables 
shared across branches, processes communicate via channels to exchange 
information about substitutions used for closure. We present the proof 
search procedure and its implementation, as well as experimental results 
obtained on problems from the TPTP library. 


Keywords: Automated Theorem Proving - Tableaux - Concurrency 


1 Introduction 


Although clausal proof techniques have enjoyed success in automated theo- 
rem proving, some applications benefit from reasoning on unaltered formulas 
(rather than Skolemized clauses), while others require the production of proofs 
in a sequent calculus. These roles are fulfilled by provers based on the tableau 
method [17], as initially designed by Beth and Hintikka [2,13]. For first-order 
logic, efficient handling of universal formulas is typically achieved with free vari- 
ables that are instantiated only when needed to close a branch. This step is said 
to be destructive because it may affect open branches sharing variables. This 
causes fairness (and consequently, completeness) issues, as illustrated in Fig. 1. 
In this example, exploring the left branch produces a substitution that prevents 
direct closure of the right branch. Reintroducing the original quantified formula 
with a different free variable is not sufficient to close the right branch, because an 
applicable -rule creates a new Skolem symbol that will result in a different but 
equally problematic substitution every time a left branch is explored. Thus, sys- 
tematically exploring the left branch before the right leads to non-termination of 
the search. Conversely, exploring the right branch first produces a substitution 
(which instantiates the free variable X with a rather than b) that closes both 
branches. 

Concurrent computing offers a way to implement a proof search procedure 
that explores branches simultaneously. Such a procedure can compare closing 
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P(a) \7P(b) AVa. (P(x) & Vy. P(y)) a 
P(a), =P(b), Vx. (P(x) @ Yy. P) _ 
P(X/b) = Vy. P(y) Bes 
P(X/b), Vy. P(y) 7P(X/b), “Vy. P(y) 5 
salka SP (ski) He 
P(X'/b) & Vy. P(y) Be 
p(X'/b), Vy. P(y) aP(X"/b), “Vy. P(y) 5 
o={X' => b} ° =P(sk2) Y 
ao = {X' > ski} = ae 


Fig. 1. Incompleteness caused by unfair selection of branches 


substitutions to detect (dis)agreements between branches, and consequently 
either close branches early, or restart proof attempts with limited backtrack- 
ing. The simultaneous exploration of branches is handled by the concurrency 
system, either by interleaving computations through scheduling, or by execut- 
ing tasks in parallel if the hardware resources allow it. A concurrent procedure 
naturally lends itself to parallel execution, allowing us to take advantage of 
multi-core architectures for efficient first-order theorem proving. Thus, concur- 
rency provides an elegant and efficient solution to proof search with free variable 
tableaux. 

In this paper, we describe a concurrent destructive proof search procedure 
for first-order analytic tableaux (Sect. 2) and its implementation in a tool called 
Goéland, as well as its evaluation on problems from the TPTP library [19] and 
comparison to other state-of-the-art provers (Sect. 3). 


Related Work. A lot of research has been carried out on the parallelization of 
proof search procedures [4], often focusing primarily on parallel execution and 
performance. In contrast, we use concurrency not only as a way to take advan- 
tage of multi-core architectures, but also as an algorithmic device that is useful 
even for sequential execution (with interleaved threads). Some concurrent and 
parallel approaches focus more distinctly on the exploration of the search space, 
either by dividing the search space between processes (distributed search) or by 
using processes with different search plans on the same space (multi search) [3]. 
These approaches can be performed either by heterogeneous systems that rely on 
cooperation between systems with different inference systems [1,8,12], or homo- 
geneous systems where all deductive processes use the same inference system. 
According to this classification, the technique presented here is a homogeneous 
system that performs a distributed search. Concurrent tableaux provers include 
the model-elimination provers CPTheo [12] and Partheo [18], and the higher- 
order prover Hot [15], which notably uses concurrency to deal with fairness issues 
arising from the non-terminating nature of higher-order unification. Lastly, con- 
currency has been used as the basis of a generic framework to present various 
proof strategies [10] or allow distributed calculations over a network [21]. 
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2 Concurrent Proof Search 


Free Variable Tableaux. Goéland attempts to build a refutation proof for a first- 
order formula, i.e., a closed tableau for its negation, using a standard free-variable 
tableau calculus [11]. The calculus is composed of a-, y- and d-rules that extend 
a branch with one formula, 8-rules that divide a branch by extending it with 
two formulas, and a ©-rule that closes a branch. y-rules deal with universally- 
quantified formulas by introducing a formula with a free variable. A free variable 
is not universally quantified, but is instead a placeholder for some term instanti- 
ation, typically determined upon branch closure. 6-rules deal with existentially- 
quantified formulas by introducing a formula with a Skolem function symbol 
that takes as arguments the free variables in the branch. This ensures freshness 
of the Skolem symbol independently of variable instantiation. 

The branch closure rule applies to a branch carrying atomic formulas P and 
Q such that, for some substitution o, o(P) = a(-Q). In that case, ø is applied 
to all branches. That rule is consequently destructive: applying a substitution 
to close one branch may modify another, removing the possibility to close it 
immediately. A tableau is closed when all its branches are closed. Closing a 
tableau can thus be seen as providing a global unifier that closes all branches. 


Semantics for Concurrency. Goéland relies on a concurrent search procedure. In 
order to present this procedure, we use a simple WHILE language augmented 
with instructions for concurrency, in the style of CSP [14]. Each process has its 
own variable store, as well as a collection of process identifiers used for com- 
munication: Tparent denotes the identifier of a process’s parent, while [children 
denotes the collection of identifiers of active children of that process. Given a 
process identifier 7 and an expression e, the command 7 !e is used to send an 
asynchronous message with the value e to the process identified by 7. Conversely, 
the command z ? x blocks the execution until the process identified by m sends a 
message, which is stored in the variable x. Lastly, the instruction start creates 
a new process that executes a function with some given arguments, while the 
instruction kill interrupts the execution of a process according to its identifier. 


Proof Search Procedure. The proof search is carried out concurrently by processes 
corresponding to branches of the tableau. Processes are started upon application 
of a Z-rule, one for each new branch. Communications between processes take 
two forms: a process may send a set of closing substitutions for its branch to 
its parent, or a parent may send a substitution (that closes one of its children’s 
branch) to the other children. The proof search is performed by the proof Search, 
waitFor Parent, and waitForChildren procedures (described in Procedures 1, 
2, and 3, respectively). 

The proof Search procedure initiates the proof search for a branch. It first 
attempts to apply the closure rule. A closing substitution is called local to a 
process if its domain includes only free variables introduced by this process or 
one of its descendants (i.e., if the variables do not occur higher in the proof tree). 
If one of the closing substitutions is local to the process, it is reported and the 
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Procedure 1: proof Search 
Data: a tableau T 


1 begin 

2 var © — applyClosingRule(T) ; 

3 for 0 € O do 

4 if isLocal(@) then 

5 Tparent ! {0} 

6 | return 

7 if O #0 then 

8 Tparent ! (©) 

9 waitForParent(T, O) 

10 else if applicableAlphaRule(T) then 
11 proof Search(applyAlphaRule(T)) 
12 else if applicableDeltaRule(T) then 
13 proof Search(applyDeltaRule(T)) 
14 else if applicableBetaRule(T) then 
15 for T’ € applyBetaRule(T) do 

16 | start proof Search(T") 

17 wattForChildren(T, 0, Ø) 

18 else if applicableGammaRule(T) then 
19 proof Search(applyGammaRule(T)) 
20 else 
21 Tparent ! ) 


process terminates. If only non-local closing substitutions are found, they are 
reported and the process executes waitFor Parent. Otherwise, the procedure 
applies tableau expansion rules according to the priority: a x 6 ~ 6 x 7. 
If a G-rule is applied, new processes are started, and each of them executes 
proof Search on the newly created branch, while the current process executes 
waitForChildren. 

The waitFor Parent procedure is executed by a process after it has found 
closing non-local substitutions. Such substitutions may prevent closure in other 
branches. In these cases, the parent will eventually send another candidate sub- 
stitution. waitFor Parent waits until such a substitution is received, and triggers 
a new step of proof search. The process may also be terminated by its parent 
(via the kill instruction) during the execution of this procedure, if one of the 
substitutions previously sent by the process leads to closing the parent’s branch. 

The waitForChildren procedure is executed by a process after the applica- 
tion of a G-rule and the creation of child processes. The set of substitutions sent 
by each child is stored in a map subst (Line 2), initially undefined everywhere 
(£1). This procedure closes the branch (Line 13) if there exists a substitution 
0 that agrees with one closing substitution of each child process, i.e., for each 
child process, the process has reported a substitution ø such that o(X) = 0(X) 
for any variable X in the domain of ø. If no such substitution can be found 
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Procedure 2: waitFor Parent 
Data: a tableau T, a set Osent of substitutions sent by this process to its parent 
1 begin 


TWparent 2a 
if o € Osent then 
Tparent lo 
waitForParent(T, Osent ) 
else 
| proof Search(o(T)) 


No a pwON 


after all the children have closed their branches, then one closing substitution 
o € subst is picked arbitrarily (Line 18) and sent to all the children (which are 
at that point executing waitF'or Parent) to restart their proof attempts. With 
the additional constraint of the substitution o, the new proof attempts may fail, 
hence the necessity for backtracking among candidate substitutions Opacktrack 
(Line 5 and 6). At the end, if all the substitutions were tried and failed, the 
process sends a failure message (symbolized by Ø) to its parent. 

Thus, concurrency and backtracking are used to prevent incompleteness 
resulting from unfair instantiation of free variables. Another potential source 
of unfairness is the y-rule, when applied more than once to a universal formula 
(reintroduction). This may be needed to find a refutation, but unbounded rein- 
troductions would lead to unfairness. Iterative deepening [16] is used to guard 
against this: a bound limits the number of reintroductions on any single branch, 
and if no proof is found, the bound is increased and the proof search restarted. 

Figure 2 illustrates the interactions between processes for the problem in 
Fig. 1, and shows how concurrency helps ensure fairness. It describes the par- 
ent process, in the top box, and below, the two children processes created upon 
application of the G-rule. Dotted lines separate successive states of a process 
(i.e., Procedures 1, 2 and 3 seen above), while arrows and boxes represent sub- 
stitution exchanges. The number above each arrow indicates the chronology of 
the interactions. After both children have returned a substitution (1), the par- 
ent arbitrarily chooses one of them, starting with X > b, and sends it to the 
children (2). Since this substitution prevents closure in the right branch (3), the 
parent later backtracks and sends the other substitution X + a (4), allowing 
both children (5) and then the parent to close successfully. 


3 Implementation and Experimental Results 
Implementation. The procedures presented in Sect.2 are implemented in the 


Goéland prover! using the Go language. Go supports concurrency and paral- 
lelism, based on lightweight execution threads called goroutines [20]. Goroutines 


1 Available at: https: //github.com/GoelandProver/Goeland/releases/tag/v1.0.0-beta. 
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Procedure 3: waitForChildren 
Data: a tableau T, a set Osent of substitutions sent by this process to its 
parent, a set Obacktrack Of substitutions used for backtracking 
begin 
var subst — fı 
while dm € IT chitdren. subst[7] = L do 


1 

2 

3 

4 m ? subst[z] 

5 if subst[7z] = 0 then 

6 if 30 € Opacktrack then 

7 for m € IT chiiaren do 7 ! 0; 

8 waitForChildren(T, Osent, Obacktrack \ {9}) 
9 else 
10 for 7 € IT chitaren do kill 7; 
11 Tparent ! ) 
12 return 
13 if 40, agreement(6, subst) then 
14 Tparent ! {0} 
15 for 7 € IT chitaren do kill 7; 

16 waitForParent(T, Osent U {0}) 

17 else 

18 a +— choice(subst) 

19 for 7 € IT chitdren dO 7 ! 0; 
20 waitForChildren(T, Osent, Obacktrack U U, subst[z] \ {o})) 


are executed according to a so-called hybrid threading (or M : N) model: M 
goroutines are executed over N effective threads and scheduling is managed by 
both the Go runtime and the operating system. This threading model allows 
the execution of a large number of goroutines with a reasonable consumption 
of system resources. Goroutines use channels to exchange messages, so that the 
implementation is close to the presentation of Sect. 2. 

Goéland has, for the time being, no dedicated mechanism for equality rea- 
soning. However, we have implemented an extension that implements deduction 
modulo theory [9], i.e., transforms axioms into rewrite rules over propositions and 
terms. Deduction modulo theory has proved very useful to improve proof search 
when integrated into usual automated proof techniques [5], and also produces 
excellent results with manually-defined rewrite rules [6,7]. In Goéland, deduction 
modulo theory selects some axioms on the basis of a simple syntactic criterion 
and replaces them by rewrite rules. 


Experimental Results. We evaluated Goéland on two problems categories with 
FOF theorems in the TPTP library (v7.4.0): syntactic problems without equal- 
ity (SYN) and problems of set theory (SET). The former was chosen for its 
elementary nature, whereas the latter was picked primarily to evaluate the per- 
formance of the deduction modulo theory, as the axioms of set theory are good 
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Fig. 2. Proof search and resulting proof for P(a) A =P (b) A Vx.(P(x) = Vy.P(y)) 


targets for rewriting. We compared the results with those of five other provers: 
tableau-based provers Zenon (v0.8.5), Princess (v2021-05-10) and Leolll (v1.6), 
as well as saturation-based provers E (v2.6) and Vampire (v4.6.1). Experiments 
were executed on a computer equipped with an Intel Xeon E5-2680 v4 2.4GHz 
2x 14-core processor and 128 GB of memory. Each proof attempt was limited to 
300s. Table 1 and Fig. 3 report the results. Table 1 shows the number of problems 
solved by each prover, the cumulative time, and the number of problems solved 
by a given prover but not by Goéland (+) and conversely (—). Figure 3 presents 
the cumulative time required to solve the number of problems. 

As can be observed, the results of Goéland are comparable to, or slightly 
better than those of other tableau-based provers on problems from SYN, while 
saturation theorem provers achieve the best results. On this category, the axioms 
do not trigger deduction modulo theory rewriting rules, hence the similar results 
of Goéland and Goéland+DMT. On SET, Goéland+DMT obtains significantly bet- 
ter results than other tableau-based provers. This confirms the previous results 
on the performance of deduction modulo theory for set theory [6,7]. 
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Table 1. Experimental results over the TPTP library 
SYN (263 problems) SET (464 problems) 
Goéland 199 (190s) 150 (4659s) 
Goéland+DMT | 199 (196s) (+0, —0) 278 (1292s) (+142, —14) 
Zenon 256 (67s) (+60, —3)) 150 (562s) (475, —75) 
Princess 195 (189s) (+1, —5) 258 (1168s) (+141, —33) 
Leolll 195 (268s) (41, —5) 177 (2925s) (+77, —50) 
E 261 (168s) (+62, —0) | 363 (2377s) (+223, —10) 
Vampire | 262 (13s) (+63, —0) 321 (4122s) (+188, —17) 


SYN category 


SET category 
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Fig. 3. Cumulative time per problem solved between Goéland, Goéland+DMT(GDMT), 
Zenon, Princess, Leolll, E, and Vampire 


4 Conclusion 


We have presented a concurrent proof search procedure for tableaux in first- 
order logic with the aim of ensuring a fair exploration of the search space. This 
procedure has been implemented in the prover Goéland. This tool is still in an 
early stage, and (with the exception of deduction modulo theory) implements 
only the most basic functionalities, yet empirical results are encouraging. We 
plan on adding functionalities such as equality reasoning, arithmetic reasoning, 
and support for polymorphism to Goéland, which should increase its usability 
and performance. The integration of these functionalities in the context of a 
concurrent prover seems to be a promising line of research. Further investigation 
is also needed to prove the fairness, and therefore completeness, of our procedure. 
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Abstract. A code X is not primitivity preserving if there is a primitive 
list w € lists X whose concatenation is imprimitive. We formalize a 
full characterization of such codes in the binary case in the proof assis- 
tant Isabelle/HOL. Part of the formalization, interesting on its own, is 
a description of {x, y}-interpretations of the square xx if |y| < |z|. We 
also provide a formalized parametric solution of the related equation 


riy" = 2°. 


1 Introduction 


Consider two words abba and b. It is possible to concatenate (several copies of) 
them as b-abba-b, and obtain a power of a third word, namely a square bab-bab 
of bab. In this paper, we completely describe all ways how this can happen for 
two words, and formalize it in Isabelle/HOL. 

The corresponding theory has a long history. The question can be formulated 
as solving equations in three variables of the special form W(a,y) = zf where 
the left hand side is a sequence of x’s and y’s, and £ > 2. The seminal result in 
this direction is the paper by R. C. Lyndon and M.-P. Schiitzenberger [10] from 
1962, which solves in a more general setting of free groups the equation xiy? = z 
with 2 < j,k, £. It was followed, in 1967, by a partial answer to our question by 
A. Lentin and M.-P. Schiitzenberger [9]. A complete characterization of monoids 
generated by three words was provided by L. G. Budkina and Al. A. Markov 
in 1973 [4]. The characterization was later, in 1976, reproved in a different way 
by Lentin’s student J.-P. Spehner in his Ph.D. thesis [14], which even explicitly 
mentions the answer to the present question. See also a comparison of the two 
classifications by T. Harju and D. Nowotka [7]. In 1985, the result was again 
reproved by E. Barbin-Le Rest and M. Le Rest [1], this time specifically focusing 
on our question. Their paper contains a characterization of binary interpretations 
of a square as a crucial tool. The latter combinatorial result is interesting on its 
own, but is very little known. In addition to the fact that, as far as we know, 
the proof is not available in English, it has to be reconstructed from Théoreme 
2.1 and Lemme 3.1 in [1], it is long, technical and little structured, with many 
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intuitive steps that have to be clarified. It is symptomatic, for example, that 
Maniuch [11] cites the claim as essentially equivalent to his desired result but 
nevertheless provides a different, shorter but similarly technical proof. 

The fact that several authors opted to provide their own proof of the already 
known result, and that even a weaker result was republished as new shows that 
the existing proof was not considered sufficiently convincing and approachable. 
This makes the topic a perfect candidate for formalization. The proof we present 
here naturally contains some ideas of the proof from [1] but is significantly dif- 
ferent. Our main objective was to follow the basic methodological requirement 
of a good formalization, namely to identify claims that are needed in the proof 
and formulate them as separate lemmas and as generally as possible so that 
they can be reused not only in the proof but also later. Moreover, the formal- 
ization naturally forced us to consider carefully the overall strategy of the proof 
(which is rather lost behind technical details of published works on this topic). 
Under Isabelle’s pressure we eventually arrived at a hopefully clear proof struc- 
ture which includes a simple, but probably innovative use of the idea of “gluing” 
words. The analysis of the proof is therefore another, and we believe the most 
important contribution of our formalization, in addition to the mere certainty 
that there are no gaps in the proof. 

In addition, we provide a complete parametric solution of the equation z¥yi = 
2 for arbitrary j, k and £, a classification which is not very difficult, but maybe 
too complicated to be useful in a mere unverified paper form. 

The formalization presented here is an organic part of a larger project of 
formalization of combinatorics of words (see an introductory description in [8]). 
We are not aware of a similar formalization project in any proof assistant. The 
existence of the underlying library, which in turn extends the theories of “List” 
and “HOL-Library.Sublist” from the standard Isabelle distribution, critically 
contributes to a smooth formalization which is getting fairly close to the way 
a human paper proof would look like, outsourcing technicalities to the (reusable) 
background. We accompany claims in this text with names of their formalized 
counterparts. 


2 Basic Facts and Notation 


Let X be an arbitrary set. Lists (i.e. finite sequences) [21,22,...,2n] of elements 
az; E€ X are called words over X. The set of all words over X is usually denoted 
as X*, using the Kleene star. A notorious ambivalence of this notation is related 
to the situation when we consider a set of words X C X”*, and are interested in 
lists over X. They should be denoted as elements of X*. However, X* usually 
means something else (in the theory of rational languages), namely the set of all 
words in X* generated by the set X. To avoid the confusion, we will therefore 
follow the notation used in the formalization in Isabelle, and write lists X 
instead, to make clear that the entries of an element of lists X are themselves 
words. In order to further help to distinguish words over the basic alphabet 
from lists over a set of words, we shall use boldface variables for the latter. 
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In particular, it is important to keep in mind the difference between a letter 
a and the word [a] of length one, the distinction which is usually glossed over 
lightly in the literature on combinatorics on words. The set of words over X 
generated by X is then denoted as (X). The (associative) binary operation of 
concatenation of two words u and v is denoted by u-v. We prefer this algebraic 
notation to the Isabelle’s original @. Moreover, we shall often omit the dot as 
usual. If u = [x1,22,...,2%n] E lists X is a list of words, then we write concat u 
for 21 : ++-a@p,. We write e for the empty list, and u” for the concatenation of 
k copies of u (we use u®k in the formalization). We write u <p V, U <p V, 
u Ss V, U <s v, and u <p v to denote that u is a prefix, a strict prefix, suffix, 
strict suffic and factor (that is, a contiguous sublist) respectively. A word is 
primitive if it is nonempty and not a power of a shorter word. Otherwise, we call 
it imprimitive. Each nonempty word w is a power of a unique primitive word 
pw, its primitive root. A nonempty word r is a periodic root of a word w if 
w <p r-w. This is equivalent to w being a prefix of the right infinite power of r, 
denoted r“. Note that we deal with finite words only, and we use the notation 
r” only as a convenient shortcut for “a sufficiently long power of r”. Two words 
u and v are conjugate, we write u ~ v, if u = rq and v = qr for some words 
r and q. Note that conjugation is an equivalence whose classes are also called 
cyclic words. A word u is a cyclic factor of w if it is a factor of some conjugate 
of w. A set of words X is a code if its elements do not satisfy any nontrivial 
relation, that is, they are a basis of a free semigroup. For a two-element set 
{x,y}, this is equivalent to z and y being non-commuting, i.e., cy 4 yx, and/or 
to px Æ py. An important characterization of a semigroup S of words to be free 
is the stability condition which is the implication u,v, uz, zv E€ S => z € S. The 
longest common prefix of u and v is denoted by u ^p v. If {x,y} is a (binary) 
code, then (x - w) Ap (y : w) = xy Ap yz for any w,w’ € ({x,y}) sufficiently 
long. We explain some elementary facts from combinatorics on words used in 
this article in more detail in Sect. 8. 


3 Main Theorem 


Let us introduce the central definition of the paper. 


Definition 1. We say that a set X of words is primitivity preserving if there 
is no word w € listsX such that 


~ |w| > 2; 
— w is primitive; and 
— concat w is imprimitive. 


Note that our definition does not take into account singletons w = [z]. In 
particular, X can be primitivity preserving even if some x € X is imprimitive. 
Nevertheless, in the binary case, we will also provide some information about 
the cases when one or both elements of the code have to be primitive. 

In [12], V. Mitrana formulates the primitivity of a set in terms of morphisms, 
and shows that X is primitivity preserving if and only if it is the minimal set of 
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generators of a “pure monoid”, cf. [3, p. 276]. This brings about a wider concept 
of morphisms preserving a given property, most classically square-freeness, see 
for example a characterization of square-free morphisms over three letters by M. 
Crochemore [5]. 

The target claim of our formalization is the following characterization of 
words witnessing that a binary code is not primitivity preserving: 


Theorem 1 (bin_imprim_code). Let B = {x,y} be a code that is not prim- 
itivity preserving. Then there are integers j > 1 and k > 1, with k = 1 or 
j = 1, such that the following conditions are equivalent for any w € lists B 
with |w] > 2: 


— w is primitive, and concat w is imprimitive 
- w is conjugate with [x] [y]*. 


Moreover, assuming |y| < |z], 


- of j > 2, then j = 2 and k = 1, and both x and y are primitive; 
- if k > 2, then j = 1 and x is primitive. 


Proof. Let w be a word witnessing that B is not primitivity preserving. That is, 
|w| > 2, w is primitive, and concat w is imprimitive. Since [x]/[y]* and [y]*[a}’ 
are conjugate, we can suppose, without loss of generality, that |y| < |æ]. 

First, we want to show that w is conjugate with [x] [y]® for some j,k > 1 
such that k = 1 or j = 1. Since w is primitive and of length at least two, it 
contains both x and y. If it contains one of these letters exactly once, then w is 
clearly conjugate with [x]/[y|* for j = 1 or k = 1. Therefore, the difficult part 
is to show that no primitive w with concat w imprimitive can contain both 
letters at least twice. This is the main task of the rest of the paper, which is 
finally accomplished by Theorem 4 claiming that words that contain at least two 
occurrences of x are conjugate with |x, x, y]. To complete the proof of the first 
part of the theorem, it remains to show that j and k do not depend on w. This 
follows from Lemma 1. 

Note that the imprimitivity of concat w induces the equality ziy = z“ 
for some z and £ > 2. The already mentioned seminal result of Lyndon and 
Schiitzenberger shows that j and k cannot be simultaneously at least two, since 
otherwise x and y commute. For the same reason, considering its primitive root, 
the word y is primitive if 7 > 2. Similarly, x is primitive if k > 2. The primitivity 
of x when j = 2 is a part of Theorem 4. 


We start by giving a complete parametric solution of the equation ziy" = z* 


in the following theorem. This will eventually yield, after the proof of Theorem 
1 is completed, a full description of not primitivity preserving binary codes. 
Since the equation is mirror symmetric, we omit symmetric cases by assuming 
ly| < |z]. 


Theorem 2 (LS_parametric_solution). Let l> 2, j,k > 1 and |y| < |z]. 
The equality xîyë = z* holds if and only if one of the following cases takes 
place: 
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A. There exists a word r, and integers m,n,t > 0 such that 


mji+nk=tl, and 
fear". ga, z=fř; 
B. j = k = 1 and there exist non-commuting words r and q, and integers 
m,n > 0 such that 


mt+tn+1l1=, and 
z= (rq)"r, y=q(rq)", z=rq; 


C. j = £L = 2, k = 1 and there exist non-commuting words r and q and an 
integer m > 2 such that 


x= (rq)™r, y=qrrq, z= (rq)™rrq; 
D. j= 1 and k > 2 and there exist non-commuting words r and q such that 


j= 


z= (qr) tq, y=r, z=qr*"; 


= 1 and k > 2 and there are non-commuting words r and q, an integer 
m > 1 such that 


x = (qr(r(qr)™) T qr(r(ar) rq, y= rlar)”, z= qr(r(qr) T. 


Proof. If x and y commute, then all three words commute, hence they are a 
power of a common word. A length argument yields the solution A. 

Assume now that {x, y} is a code. Then no pair of words x, y and z commutes. 
We have shown in the overview of the proof of Theorem 1 that 7 = 1 or k = 1 
by the Lyndon-Schtitzenberger theorem. The solution is then split into several 
cases. 


Case 1:7 =k=1. 
Let m and r be such that z™r = x with r a strict prefix of z. By setting z = rq, 
we obtain the solution B with n = €—m-—-1. 


Case 2:9 >2,k=1. 
Since |y| < |x| and £ > 2, we have 


2z] < [zf] = |e] + [yl < 2\2"|, 


so z is a strict prefix of x. 

As a has periodic roots both z and x, and z does not commute with z, the 
Periodicity lemma implies |xz/| < |z| + |z|. That is, z = 2/~1u, xf = zv and 
x = uv for some nonempty words u and v. As v is a prefix of z, it is also a prefix 
of x. Therefore, we have 

z = uv = vu 


for some word u’. This is a well known conjugation equality which implies u = rq, 
u’ = qr and v = (rq)"r for some words r, q and an integer n > 0. 
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We have 
jiel + ly| = |e? y| = |2*| = £5 — 1)læ| + elu, 


and thus |y| = (£j — L — j)|a| + llu]. Since |y| < |x|, |u| > 0, 7 > 2, and £ > 2, it 
follows that £j — £ — j = 0, which implies j = 1 = 2. We therefore have x?y = 2? 
and x? = zv, hence vy = z. 

Combining u = rq, u’ = qr, and v = (rq)"r with x = vu’, z = xf lu = gu 
vu'u, and vy = z, we obtain the solution C with m = n+ 1. The assumption 
jy] < |x| implies m > 2. 


Case 3: j = 1,k > 2, y"<,2. 
We have z = qy" for some word q. Noticing that x = z 
D. 


Case 4: j =1,k > 2, z <s y". 
This case is analogous to the second part of Case 2. Using the Periodicity lemma, 
we obtain uy*—! = z, y} = vz, and y = vu with nonempty u and v. As v is a 
suffix of z, it is also a suffix of y, and we have y = vu = u’v for some u’. Plugging 
the solution of the last conjugation equality, namely u’ = rq, u = qr, v = (rq)"r, 
into y = u'v, z = uy*—! and z⁄7! = qv gives the solution E with m =n +1. 

Finally, the words r and q do not commute since z and y, which are generated 
by r and q, do not commute. 

The proof is completed by a direct verification of the converse. 


‘lq yields the solution 


We now show that, for a given not primitivity preserving binary code, there 
is a unique pair of exponents (j,k) such that x/y* is imprimitive. 


Lemma 1 (LS_unique). Let B = {x,y} be a code. Assume j,k, 7',k’ > 1. If 
both xy" and «i y* are imprimitive, then j = j' and k =k’. 


Proof. Let 21,22 be primitive words and £, ¢’ > 2 be such that 


riy" = 26 and a ye = ze (1) 
Since B is a code, the words x and y do not commute. We proceed by contra- 
diction. 


Case 1: First, assume that j = j’ and k Æ k’. 

Let, without loss of generality, k < k’. From (1) we obtain zfy*’—* = 2. The 
case k’ — k > 2 is impossible due to the Lyndon-Schiitzenberger theorem. Hence 
k’! —k = 1. This is another place where the formalization triggered a sim- 
ple and nice general lemma (easily provable by the Periodicity lemma) which 
will turn out to be useful also in the proof of Theorem 4. Namely, the lemma 
imprim_ext_suf_comm claims that if both uv, and uvv are imprimitive, then u 
and v commute. We apply this lemma to u = giy! and v = y, obtaining a 
contradiction with the assumption that x and y do not commute. 


Case 2. The case k = k’ and j Æ j’ is symmetric to Case 1. 


Case 3. Let finally j # 7’ and k Æ k’. The Lyndon-Schiitzenberger theorem 
implies that either j or k is one, and similarly either j’ or k’ is one. We can 
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therefore assume that k = j’ = 1 and k’,7j > 2. Moreover, we can assume that 
\y| < |x|. Indeed, in the opposite case, we can consider the words ya) and y" x” 
instead, which are also both imprimitive. 

Theorem 2 now allows only the case C for the equality ziy = z£. We therefore 
have j = £ = 2 and x = (rq)r, y = qrrq for an integer m > 2 and some non- 
commuting words r and q. Since y = qrrq is a suffix of z$, this implies that z2 
and rq do not commute. Consider the word x - gr = (rq) rgqr, which is a prefix 
of xy, and therefore also of z5. This means that x -qr has two periodic roots, 
namely rq and z2, and the Periodicity lemma implies that |æ - gr| < |rq| + |z2l. 
Hence x is shorter than z2. The equality ry” = ze, with ¢’ > 2, now implies 
on one hand that rqrq is a prefix of z2, and on the other hand that zə is a 
suffix of y*’. It follows that rqrq is a factor of (qrrq)*. Hence rqrq and qrrq are 
conjugate, thus they both have a period of length |rg|, which implies gr = rq. 
This is a contradiction. 


The rest of the paper, and therefore also of the proof of Theorem 1, is orga- 
nized as follows. In Sect.4, we introduce a general theory of interpretations, 
which is behind the main idea of the proof, and apply it to the (relatively simple) 
case of a binary code with words of the same length. In Sect.5 we characterize 
the unique disjoint extendable {x, y}-interpretation of the square of the longer 
word x. This is a result of independent interest, and also the cornerstone of 
the proof of Theorem 1 which is completed in Sect.6 by showing that a word 
containing at least two x’s witnessing that {x,y} is not primitivity preserving is 
conjugate with |z, x, y]. 


4 Interpretations and the Main Idea 


Let X be a code, let u be a factor of concat w for some w € lists X. The 
natural question is to decide how u can be produced as a factor of words from 
X, or, in other words, how it can be interpreted in terms of X. This motivates 
the following definition. 


Definition 2. Let X be a set of words over X. We say that the triple (p,s,w) € 
X* x X* x lists X is an X-interpretation of a word u € X* if 


— w is nonempty; 

— p:-u-s=concatw; 
- p <p hdw and 

— § <, last w. 


The definition is illustrated by the following figure, where w = [w1, wa, ws, w4]: 


The second condition of the definition motivates the notation pus ~z w for the 
situation when (p, s,w) is an X-interpretation of u. 
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Remark 1. For sake of historical reference, we remark that our definition of X- 
interpretation differs from the one used in [1]. Their formulation of the situa- 
tion depicted by the above figure would be that u is interpreted by the triple 
(s’, wa - w3, p') where p- s’ = w, and p' -s = w4. This is less convenient for two 
reasons. First, the decomposition of wə- w3 into [w2, w3] is only implicit here 
(and even ambiguous if X is not a code). Second, while it is required that the 
the words p’ and s’ are a prefix and a suffix, respectively, of an element from X, 
the identity of that element is left open, and has to be specified separately. 


If u is a nonempty element of (X) and u = concatu for u € lists X, 
then the X-interpretation «we ~z u is called trivial. Note that the trivial X- 
interpretation is unique if X is a code. 

As nontrivial X-interpretations of elements from (X) are of particular inter- 
est, the following two concepts are useful. 


Definition 3. An X-interpretation pus ~z w of u = concat u is called 


— disjoint if concat w’ Æ p- concatu’ whenever w' <p w and uw’ <p u. 
— extendable if p <s wp and s <p Ws for some elements wp, Ws E€ (X). 


Note that a disjoint X-interpretation is not trivial, and that being disjoint 
is relative to a chosen factorization u of u (which is nevertheless unique if X is 
a code). 

The definitions above are naturally motivated by the main idea of the 
characterization of sets X that do not preserve primitivity, which dates back 
to Lentin and Schtitzenberger [9]. If w is primitive, while concat w is imprim- 
itive, say concat w = z*, k > 2, then the shift by z provides a nontrivial and 
extendable X-interpretation of concat w. (In fact, k—1 such nontrivial interpre- 
tations). Moreover, the following lemma, formulated in a more general setting 
of two words w; and w2, implies that the X-interpretation is disjoint if X is a 
code. 


Lemma 2 (shift_interpret, shift_disjoint). Let X be a code. Let 
W1,W2 E lists X be such that z-concat w = concat ws -z where z ¢ (X). 
Then z+ concat vı # concat v2, whenever vı <p wi and v2 <p w3, n EN. 

In particular, concatu has a disjoint extendable X-interpretation for any 
prefix u of wy. 


The excluded possibility is illustrated by the following figure. 


concat vo 


I 
concat Wo i concat W2 i 
: concat Ww} l concat W1 


Toe ees eee 
concat vi 
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Proof. First, note that z-concat wf = concat w} -z for any n. Let w? = vi- vi 
and wi} = v2- v3. If z- concatvı = concat veg, then also concat v% :z = 
concat vi. This contradicts z ¢ (X) by the stability condition. 

An extendable X-interpretation of u is induced by the fact that concat u is 
covered by concat(we2- w2). The interpretation is disjoint by the first part of 
the proof. 


In order to apply the above lemma to the imprimitive concatw = z* of a 
primitive w, set w; = W2 = w. The assumption z ¢ (X) follows from the 
primitivity of w: indeed, if z = concatz, with z € lists X, then w = z“ since 
B is a code. 

We first apply the main idea to a relatively simple case of nontrivial {x, y}- 
interpretation of the word x - y where x and y are of the same length. 


Lemma 3 (uniform_square_interp). Let B = {x,y} be a code with |x| = |y]. 
Let p (x-y) s ~z v be a nontrivial B-interpretation. Then v = |x,y,a] or 
v = |y, x,y] and z - y is imprimitive. 


Proof. From p- g- y- s = concat v, it follows, by a length argument, that |v] is 
three. A straightforward way to prove the claim is to consider all eight possible 
candidates. In each case, it is then a routine few line proof that shows that x = y, 
unless v = [x,y,z] or v = [y, x,y], which we omit. In the latter cases, x -y is a 
nontrivial factor of its square (x- y) - (x - y), which yields the imprimitivity of 
Ly. 


The previous (sketch of the) proof nicely illustrates on a small scale the advan- 
tages of formalization. It is not necessary to choose between a tedious elementary 
proof for sake of completeness on one hand, and the suspicion that something 
was missed on the other hand (leaving aside that the same suspicion typically 
remains even after the tedious proof). A bit ironically, the most difficult part 
of the formalization is to show that v is indeed of length three, which needs no 
further justification in a human proof. 

We have the following corollary which is a variant of Theorem 4, and also 
illustrates the main idea of its proof. 


Lemma 4 (bin_imprim_not_conjug). Let B = {x,y} be a binary code with 
|x| = |y|. If w € lists B is such that |w| > 2, w is primitive, and concat w is 
imprimitive, then x and y are not conjugate. 


Proof. Since w is primitive and of length at least two, it contains both letters 
x and y. Therefore, it has either [x,y] or [y, x] as a factor. The imprimitivity of 
concat w yields a nontrivial B-interpretation of x y, which implies that x- y is 
not primitive by Lemma 3. 

Let x and y be conjugate, and let x = r- q and y = q: r. Since x-y =r- -qqr 
is imprimitive, also r-r -q-q is imprimitive. Then r and q commute by the 
theorem of Lyndon and Schützenberger, a contradiction with x F y. 
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5 Binary Interpretation of a Square 


Let B = {x,y} be a code such that |y| < |z|. In accordance with the main 
idea, the core technical component of the proof is the description of the disjoint 
extendable B-interpretations of the square x”. This is a very nice result which is 
relatively simple to state but difficult to prove, and which is valuable on its own. 
As we mentioned already, it can be obtained from Théoréme 2.1 and Lemme 3.1 
in [1]. 


Theorem 3 (square_interp_ext.sq_ext_interp). Let B = {x,y} be a code 
such that |y| < |x|, both x and y are primitive, and x and y are not conjugate. 
Let p(x-x)s~z w be a disjoint extendable B-interpretation. Then 


w= (2, 9, x], S*p=Y, p:r=2:S8. 


In order to appreciate the theorem, note that the definition of interpretation 
implies 
pL Ue S=U-y-a, 
hence x- y: x = (p: x)?. This will turn out to be the only way how primitivity 
may not be preserved if x occurs at least twice in w. Here is an example with 
x = 01010 and y = 1001: 


010101100 1/0 1010 


Proof. By the definition of a disjoint interpretation, we have p-x-x-s = concat w, 
where p Æ € and s Æ £. A length argument implies that w has length at least 
three. Since a primitive word is not a nontrivial factor of its square, we have 

= [haw]; [y]! - [last w], with k > 1. Since the interpretation is disjoint, we 
can split the equality into p- z = hd w - y™ - u and z- s = v - yf - last w, where 
y = u-v, both u and v are nonempty, and k = £+ m + 1. We want to show 
hdw = lastw = x and m = ¢ = 0. The situation is mirror symmetric so we 
can solve cases two at a time. 

If hdw = lastw = y, then powers of x and y share a factor of length at 
least |x| + |y|. Since they are primitive, this implies that they are conjugate, a 
contradiction. The same argument applies when £ > 1 and hdw = y (ifm > 1 
and last w = y respectively). Therefore, in order to prove hdw = last w = a, 
it remains to exclude the case hdw = y, l = 0 and lastw = ux (last w = y, 

= 0 and hdw = z7x respectively). This is covered by one of the technical 
lemmas that we single out: 


Lemma 5 (pref_suf_pers_short). Letx <p v-z, © Ss p-u-v-u and |x| > |w -u| 
with p € ({u,v}). Thenu-v=v-u. 
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This lemma indeed excludes the case we wanted to exclude, since the con- 
clusion implies that y is not primitive. We skip the proof of the lemma here and 
make instead an informal comment. Note that v is a period root of x. In other 
words, x is a factor of v“. Therefore, with the stronger assumption that v-u-v is 
a factor of x, the conclusion follows easily by the familiar principle that v being 
a factor of v“ “synchronizes” primitive roots of v. Lemma 5 then exemplifies 
one of the virtues of formalization, which makes it easy to generalize auxiliary 
lemmas, often just by following the most natural proof and checking its minimal 
necessary assumptions. 

Now we have hd w = last w = 2, hence p-x = x-y™-uand z-s = v-y*-«. The 
natural way to describe this scenario is to observe that x has both the (prefix) 
period root v: yf, and the suffix period root y™- u. Using again Lemma 5, we 
exclude situations when £ = 0 and m > 1 (m = 0 and £ > 1 resp.). It therefore 
remains to deal with the case when both m and £ are positive. We divide this 
into four lemmas according to the size of the overlap the prefix v : y and the 
suffix y™- u have in x. More exactly, the cases are: 


- |w- yf] + ly” -ul < |a| 

- jæ] < [v -y| + ly™ -ul < Ja] + [ul 

- |z| + fu] < |v- yf| + ly” -u| < |z| + Ju- vl 
- |z|+|u-v| < |v: yf] + ly” -ul 


and they are solved by an auxiliary lemma each. The first three cases yield 
that u and v commute, the first one being a straightforward application of the 
Periodicity lemma. The last one is also straightforward application of the “syn- 
chronization” idea. It implies that x -x is a factor of y”, a contradiction with 
the assumption that x and y are primitive and not conjugate. Consequently, the 
technical, tedious part of the whole proof is concentrated in lemmas dealing with 
the second, and the third case (see lemmas short_overlap and medium_overlap 
in the theory Binary_Square_Interpretation.thy). The corresponding proofs 
are further analyzed and decomposed into more elementary claims in the for- 
malization, where further details can be found. 

This completes the proof of w = [x,y,z]. A byproduct of the proof is the 
description of words x, y, p and s. Namely, there are non-commuting words r 
and t, and integers m, k and £ such that 


ea (rey tr, ya (enh (ryt, pa (rh, = (in). 


The second claim of the present theorem, that is, y = s-p is then equivalent to 
k = @, and it is an easy consequence of the assumption that the interpretation 
is extendable. 


6 The Witness with Two 2z’s 


In this section, we characterize words witnessing that {x,y} is not primitivity 
preserving and containing at least two x’s. 


380 S. Holub et al. 


Theorem 4 (bin_imprim_longer twice). Let B = {x,y} be a code such that 
jy] < |z|. Let w € lists {x,y} be a primitive word which contains x at least 
twice such that concatw is imprimitive. 

Then w ~ [x,x,y]| and both x and y are primitive. 


We divide the proof in three steps. 


The Core Case. We first prove the claim with two additional assumptions 
which will be subsequently removed. Namely, the following lemma shows how 
the knowledge about the B-interpretation of x - a from the previous section is 
used. The additional assumptions are displayed as items. 


Lemma 6 (bin imprim primitive). Let B = {x,y} be a code with |y| < |z| 
where 


— both x and y are primitive, 

and let w € lists B be primitive such that concat w is imprimitive, and 
— |x, x] is a cyclic factor of w. 

Then w ~ |z, x,y]. 


Proof. Choosing a suitable conjugate of w, we can suppose, without loss of 
generality, that [z,x] is a prefix of w. Now, we want to show w = [æ,x,y]. 
Proceed by contradiction and assume w Æ |x, x,y]. Since w is primitive, this 
implies w- [x, x, y] 4 |x, x, y] < w 

By Lemma 4, we know that x and y are not conjugate. Let concat w = z*, 
2 < k and z primitive. Lemma 2 yields a disjoint extendable B-interpretation of 
(concat w)”. In particular, the induced disjoint extendable B-interpretation of 
the prefix x- x is of the form p(x - x) s ~z [x,y,z] by Theorem 3: 


s P 


D A E A E E 
aes es 


P 


Let p be the prefix of w such that concat p- p = z. Then 

concat(p: |x, y]) =z-(a-p), concat[z,2,y]=(a-p)?, concatw = 2", 
and we want to show z = xp, which will imply concat(|x, x, y]-w) = concat(w 
|x, x, y]), hence w = |x, x,y] since {z, y} is a code, and both w and [z, xz, y] are 
primitive. 

Again, proceed by contradiction, and assume z Æ xp. Then, since both z and 
x-p are primitive, they do not commute. We now have two binary codes, namely 
{w, [z,x,y]} and {z,xp}. The following two equalities, (2) and (3) exploit the 
fundamental property of longest common prefixes of elements of binary codes 
mentioned in Sect. 2. In particular, we need the following lemma: 
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Lemma 7 (bin_code_lcp_concat). Let X = {uo,ui} be a binary code, and 
let Zo,Z, E€ lists X be such that concat zo and concatz, are not prefiz- 
comparable. Then 


(concat Zo) ^p (concat z1) = concat(Zo ^p 21) (uo A u1). 


See Sect. 8 for more comments on this property. Denote &z ep = Z : £p ^p ZP: Z. 
Then also &z zp = 2" + (xp)? ^p (£p)? - z*. Similarly, let Qs y = £- y ^p y: x. Then 
Lemma 7 yields 


Qz xp = concat(w - |x, x, y]) Ap concat(|[x, x,y] - w) 


(2) 


= concat(w: |x, £, y] Ap [£, £, Y]: W): On,y 


and also 
Z: Qz ap =concat(w-p- [x,y]) Ap concat(p - [x,y] - w) (3) 
=concat(w : p: [x,y] App: [x,y] - w): azy. 
Denote 
vi =W- [z, £, y] Np [ae £, y] "W, v2=Ww'p: [x,y] Np P f [x,y] Wes 


From (2) and (3) we now have z- concat vı = concat vo. Since vı and vz are 
1 2 1 2 
prefixes of some w”, we have a contradiction with Lemma 2. 


Dropping the Primitivity Assumption. We first deal with the situation 
when x and y are not primitive. A natural idea is to consider the primitive 
roots of x and y instead of x and y. This means that we replace the word w 
with Rw, where R is the morphism mapping [a] to [pz|* and [y] to [py] 
where x = (px) and y = (py)®. For example, if x = abab and y = aa, and 
w = |x, y, x] = [abab, aa, abab], then Rw = fab, ab, a, a, ab, ab]. 

Let us check which hypotheses of Lemma 6 are satisfied in the new setting, 
that is, for the code {px, py} and the word Rw. The following facts are not 
difficult to see. 


— concat w = concat(Rw); 
— if [c,c], c € {x,y}, is a cyclic factor w, then [pc, pc] is a cyclic factor of Rw. 


The next required property: 
— if w is primitive, then Rw is primitive; 


deserves more attention. It triggered another little theory of our formalization 
which can be found in locale sings_code. Note that it fits well into our context, 
since the claim is that R is a primitivity preserving morphism, which implies 
that its image on the singletons [2] and [y] forms a primitivity preserving set of 
words, see theorem code.roots_prim_morph. 

Consequently, the only missing hypothesis preventing the use of Lemma 6 is 
|y| < |x| since it may happen that |p| < |py|. In order to solve this difficulty, 
we shall ignore for a while the length difference between x and y, and obtain the 
following intermediate lemma. 
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Lemma 8 (bin imprim both squares, bin_imprim_both_squares_prim). Let 
B = {x,y} be a code, and let w E€ lists B be a primitive word such that 
concat w is imprimitive. Then w cannot contain both |x, a] and |y, y] as cyclic 
factors. 


Proof. Assume that w contains both |x, x] and [y, y] as cyclic factors. 

Consider the word Rw and the code {p2,py}. Since Rw contains both 
lox, pz] and [py, py], Lemma 6 implies that Rw is conjugate either with the 
word [p x, px, py] or with [p y, py, px], which is a contradiction with the assumed 
presence of both squares. 


Concluding the Proof by Gluing. It remains to deal with the existence of 
squares. We use an idea that is our main innovation with respect to the proof 
from [1], and contributes significantly to the reduction of length of the proof, and 
hopefully also to its increased clarity. Let w be a list over a set of words X. The 
idea is to choose one of the words, say u € X, and to concatenate (or “glue”) 
blocks of u’s to words following them. For example, if w = |u, v, u, u, z, u, 2], 
then the resulting list is [uv, uuz, uz]. This procedure is in the general case well 
defined on lists whose last “letter” is not the chosen one and it leads to a new 
alphabet {u’- v | v 4 u} which is a code if and only if X is. This idea is used 
in an elegant proof of the Graph lemma (see [8] and [2]). In the binary case, 
which is of interest here, if w in addition does not contain a square of a letter, 
say |z, x], then the new code {x - y,y} is again binary. Moreover, the resulting 
glued list w’ has the same concatenation, and it is primitive if (and only if) w 
is. Note that gluing is in this case closely related to the Nielsen transformation 
y — a~'y known from the theory of automorphisms of free groups. 
Induction on |w| now easily leads to the proof of Theorem 4. 


Proof (of Theorem 4). If w contains y at most once, then we are left with the 
equation zf -y = zf, £ > 2. The equality j = 2 follows from the Periodicity 
lemma, see Case 2 in the proof of Theorem 2. 

Assume for contradiction that y occurs at least twice in w. Lemma 8 implies 
that at least one square, [x, x] or [y, y] is missing as a cyclic factor. Let {x', y} = 
{x, y} be such that |x’, x’] is not a cyclic factor of w. We can therefore perform the 
gluing operation, and obtain a new, strictly shorter word w’ € lists {x’-y’,y’}. 
The longer element x’ - y’ occurs at least twice in w’, since the number of its 
occurrences in w’ is the same as the number of occurrences of x’ in w, the 
latter word containing both letters at least twice by assumption. Moreover, w’ 
is primitive, and concat w’ = concat w is imprimitive. Therefore, by induction 
on |w], we have w’ ~ |x" -y', a’ -y’, y’]. In order to show that this is not possible 
we can successfully reuse the lemma imprim_ext_suf_comm mentioned in the 
proof of Lemma 1, this time for u = a’y'z’ and v = y’. The words u and v do 
not commute because x’ and y’ do not commute. Since uv is imprimitive, the 
word uvv ~ concat w’ is primitive. 


This also completes the proof of our main target, Theorem 1. 
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7 Additional Notes on the Formalization 


The formalization is a part of an evolving combinatorics on words formalization 
project. It relies on its backbone session, called CoW, a version of which is also 
available in the Archive of Formal Proofs [15]. This session covers basics con- 
cepts in combinatorics on words including the Periodicity lemma. An overview 
is available in [8]. 

The evolution of the parent session CoW continued along with the pre- 
sented results and its latest stable version is available at our repository [16]. 
The main results are part of another Isabelle session CoW Equations, which, 
as the name suggests, aims at dealing with word equations. We have greatly 
expanded its elementary theory Equations_Basic.thy which provides auxiliary 
lemmas and definitions related to word equations. Noticeably, it contains the 
definition factor_interpretation (Definition 2) and related facts. 

Two dedicated theories were created: Binary Square_Interpretation. thy 
and Binary_Code_Imprimitive.thy. The first contains lemmas and locales deal- 
ing with {x, y}-interpretation of the square xx (for |y| < ||), culminating in 
Theorem 3. The latter contains Theorems 1 and 4. 

Another outcome was an expansion of formalized results related to the 
Lyndon-Schiitzenberger theorem. This result, along with many useful corollaries, 
was already part of the backbone session CoW, and it was newly supplemented 
with the parametric solution of the equation xiy? = zf, specifically Theorem 2 
and Lemma 1. This formalization is now part of CoW_Equations in the theory 
Lyndon_Schutzenberger . thy. 

Similarly, the formalization of the main results triggered a substantial expan- 
sion of existing support for the idea of gluing as mentioned in Sect. 6. Its reworked 
version is now in a separate theory called Glued_Codes.thy (which is part of the 
session CoW_Graph_Lemma). 

Let us give a few concrete highlights of the formalization. A very useful 
tool, which is part of the CoW session, is the reversed attribute. The attribute 
produces a symmetrical fact where the symmetry is induced by the mapping rev, 
i.e., the mapping which reverses the order of elements in a list. For instance, the 
fact stating that if p is a prefix of v, then p a prefix of v- w, is transformed by 
the reversed attribute into the fact saying that if s is suffix of v, then s is a suffix 
of w- v. The attribute relies on ad hoc defined rules which induce the symmetry. 
In the example, the main reversal rule is 


(rev u < p rev v)=u <s v. 


The attribute is used frequently in the present formalization. For instance, Fig. 1 
shows the formalization of the proof of Cases 1 and 2 of Theorem 1. Namely, 
the proof of Case 2 is smoothly deduced from the lemma that deals with Case 1, 
avoiding writing down the same proof again up to symmetry. See [13] for more 
details on the symmetry and the attribute reversed. 

To be able to use this attribute fully in the formalization of main results, it 
needed to be extended to be able to deal with elements of type ‘a list list, 
as the constant factor_interpretation is of the function type over this exact 
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have primitive [x,x,y] 
using (x £ y) 


proof (cases) by primitivity-inspection 


case 1 


then show ?thesis 
Sı from (|ws| = 3) ws € lists {x,y p 


(x Æ y) ([x, x] <f ws - ws) 
(ly, y] <f ws - ws) 

show False 

by list-inspection simp-all 


using LS-unique-same 
assms(1, 4—8) by blast 
next 
case 2 
then show ?thesis 
using LS-unique-same[reversed] 
assms(1, 3, 5—8) by blast from (p-t-s=t-t-p) 
havep:-t=t-p 
(a) Using the reversed attribute to solve by mismatch 


symmetric cases. 
(b) Methods primitivity_inspection, 


list_inspection and mismatch. 


Fig. 1. Highlights from the formalization in Isabelle/HOL. 


type. The new theories of the session CoW_Equations contain almost 50 uses of 
this attribute. 

The second highlight of the formalization is the use of simple but useful 
proof methods. The first method, called primitivity_inspection, is able to 
show primitivity or imprimitivity of a given word. 

Another method named list_inspection is used to deal with claims that 
consist of straightforward verification of some property for a set of words given 
by their length and alphabet. For instance, this method painlessly concludes 
the proof of lemma bin_imprim_both_squares_prim. The method divides the 
goal into eight easy subgoals corresponding to eight possible words. All goals are 
then discharged by simp_all. 

The last method we want to mention is mismatch. It is designed to prove that 
two words commute using the property of a binary code mentioned in Sect. 2 
and explained in Sect. 8. Namely, if a product of words from {z, y} starting with 
x shares a prefix of length at least |xy| with another product of words from 
{x,y}, this time starting with y, then x and y commute. Examples of usage of 
the attribute reversed and all three methods are given in Fig. 1. 


8 Appendix: Background Results in Combinatorics 
on Words 


A periodic root r of w need not be primitive, but it is always possible to consider 
the corresponding primitive root pr, which is also a periodic root of w. Note that 
any word has infinitely many periodic roots since we allow r to be longer than 
w. Nevertheless, a word can have more than one period even if we consider 
only periods shorter than |w]. Such a possibility is controlled by the Periodicity 
lemma, often called the Theorem of Fine and Wilf (see [6]): 
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Lemma 9 (per_lemma_comm). If w has a period u and v, i.e., w <p uw and 
w <p vw, with |u| + v| — ged(|ul, |v]) < |w], then uv = vu. 


Usually, the weaker test |u| + |v] < |w| is sufficient to indicate that u and v 
commute. 
Conjugation u ~ v is characterized as follows: 


Lemma 10 (conjugation). If uz = zv for nonempty u, then there exists words 
r and q and an integer k such that u = rq, v = qr and z = (rq)*r. 


We have said that w has a periodic root r if it is a prefix of r“. If w is a factor, 
not necessarily a prefix, of r”, then it has a periodic root which is a conjugate of 
r. In particular, if |u| = |v|, then u ~ v is equivalent to u and v being mutually 
factors of a power of the other word. 

Commutation of two words is characterized as follows: 


Lemma 11 (comm). xy = yx if and only if x = të and y = t™ for some word t 
and some integers k,m > 0. 


Since every nonempty word has a (unique) primitive root, the word t can be 
chosen primitive (k or m can be chosen 0 if x or y is empty). 

We often use the following theorem, called “the theorem of Lyndon and 
Schützenberger”: 


Theorem 5 (Lyndon-Schutzenberger). If x/y* = z with j > 2, k > 2 and 
£ > 2, then the words x, y and z commute. 


A crucial property of a primitive word ¢ is that it cannot be a nontrivial 
factor of its own square. For a general word u, the equality u : u = p- u -s with 
nonempty p and s implies that all three words p, s, u commute, that is, have a 
common primitive root t. This can be seen by writing u = t*, and noticing that 
the presence of a nontrivial factor u inside uu can be obtained exclusively by a 
shift by several t’s. This idea is often described as “synchronization” . 

Let x and y be two words that do not commute. The longest common prefix 
of xy and yx is denoted a. Let cz and cy be the letter following a in ry and 
yx respectively. A crucial property of @ is that it is a prefix of any sufficiently 
long word in ({x,y}). Moreover, if w = [u1, u2,..., Un] € lists {x,y} is such 
that concat w is longer than a, then a - [cz] is a prefix of concat w if uy = x 
and a: [cy] is a prefix of concat w if u; = y. That is why the length of a is 
sometimes called “the decoding delay” of the binary code {x,y}. Note that the 
property indeed in particular implies that {x,y} is a code, that is, it does not 
satisfy any nontrivial relation. It is also behind our method mismatch. Finally, 
using this property, the proof of Lemma 7 is straightforward. 
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Abstract. We describe a system that detects an invariance in a logical 
formula expressing a math problem and simplifies it by eliminating vari- 
ables utilizing the invariance. Pre-defined function and predicate symbols 
in the problem representation language are associated with algebraically 
indexed types, which signify their invariance property. A Hindley-Milner 
style type reconstruction algorithm is derived for detecting the invariance 
of a problem. In the experiment, the invariance-based formula simplifi- 
cation significantly enhanced the performance of a problem solver based 
on quantifier-elimination for real-closed fields, especially on the problems 
taken from the International Mathematical Olympiads. 


1 Introduction 


It is very common to find an argument marked by the phrase “without loss of 
generality” (w.l.o.g.) in a mathematical proof by human. An argument of this 
kind is most often based on the symmetry or the invariance in the problem [9]. 

Suppose that we are going to prove, by an algebraic method, that the three 
median lines of a triangle meet at a point (Fig. 1). Six real variables are needed 
to represent three points on a plane. Since the concepts of ‘median lines’ and 
‘meeting at a point’ are translation-invariant, we may fix one of the corners at 
the origin. Furthermore, because these concepts are also invariant under any 
invertible linear map, we may fix the other two points to, e.g., (1,0) and (0,1). 
Thus, all six variables were eliminated and the task of proof became much easier. 

W.l.o.g. arguments may thus have strong impact on the efficiency of inference. 
It has drawn attention in several research areas including the relative strength of 
proof systems (e.g., [2,3,12,20]), propositional SAT (e.g., [1,6,8,17,19]), proof 
assistants [9], and algebraic methods for geometry problem solving [7,10]. 

Among others, Iwane and Anai [10] share exactly the same objective with 
us; both aim at solving geometry problems stated in natural language, using 
an algebraic method as the backend. Logical formulas resulted from mechanical 
translation of problem text tend to be huge and very redundant, while the com- 
putational cost of algebraic methods is generally quite sensitive to the size of 
the input measured by, e.g., the number of variables. Simplification of the input 
formula is hence a mandatory part of such a problem-solving system. 


© The Author(s) 2022 
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Fig. 1. Variable Elimination w.l.o.g. by Invariance 


Iwane and Anai’s method operates on the first-order formula of real-closed 
fields (RCFs), i.e., a quantified boolean combination of equalities and inequalities 
between polynomials. They proposed to detect the invariance of a problem by 
testing the invariance of the polynomials under translation, scaling, and rotation. 
While being conceptually simple, it amounts to discover the geometric property 
of the problem solely by its algebraic representation. The detection of rotational 
invariance is especially problematic because, to test that on a system of polyno- 
mials, one needs to identify all the pairs (or triples) of variables that originate 
from the x and y (and z) coordinates of the same points. Thus their algorithm 
for 2D rotational invariance already incurs a search among a large number of 
possibilities and they left the detection of 3D rotational invariance untouched. 
Davenport [7] also suggests essentially the same method. 

In this paper, we propose to detect the invariance in a more high-level lan- 
guage than that of RCF. We use algebraically indexed types (AITs) proposed by 
Atkey et al. [4] as the representation language. In AIT, each symbol in a formula 
has a type with indices. An indexed-type of a function indicates that its output 
undergoes the same or a related transformation as the input. The invariances 
of the functions are combined via type reconstruction and an invariance in a 
problem is detected. 

The contribution of the current paper is summarized as follows: 


1. A type reconstruction algorithm for AIT is derived. Atkey et al. [4] laid out 
the formalism of AIT but did not provide a type inference/reconstruction 
algorithm. We devised, for a version of AIT, a type reconstruction algorithm 
that is based on semantic unification in the theory of transformation groups. 

2. A set of variable elimination rules are worked out. Type reconstruction in AIT 
discerns a more fine-grained notion of invariance than previous approaches. 
We derived a set of elimination rules that covers all cases. 

3. The practicality of the proposed method is verified; it significantly enhanced 
the performance of a problem solver based on quantifier elimination for RCF, 
especially on the problems from International Mathematical Olympiads. 


In the rest of the paper, we first introduce a math problem solver, on which 
the proposed method was implemented, and summarize the formalism of AIT. 
We then detail the type reconstruction procedure and the variable elimination 
rules. We finally present the experimental results and conclude the paper. 
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Fig. 2. Overview of Todai Robot Math Problem Solver 


RCF-QE 
Solver 


(Show (forall (ABCDXKLM) 
(-> (&& (is-triangle A B C) 
(= (rad-of-angle (angle B C A)) (* 90 (Degree))) 
(= D (foot-of-perp-line-from-to C (line A B))) 
(on X (seg C D)) (! (= X C)) (! (= X D)) 
(on K (seg A X)) 
(= (length-of (seg B K)) (length-of (seg B C))) 
(on L (seg B X)) 
(= (length-of (seg A L)) (length-of (seg A C))) 
(intersect (seg A L) (seg B K) M)) 
(= (length-of (seg M L)) (length-of (seg M K)))))) 


Fig. 3. Example of Manually Formalized Problem (IMO 2012, Problem 5) 


2 Todai Robot Math Solver and Problem Library 


This work is a part of the development of the Todai Robot Math Problem Solver 
(henceforth TOROBOMATH) [13-16]. Figure 2 presents an overview of the system. 
TOROBOMATH is targeted at solving pre-university math problems. Our long- 
term goal is to develop a system that solves problems stated in natural language. 

The natural language processing (NLP) module of the system accepts a prob- 
lem text and derives its logical representation through syntactic analysis. Cur- 
rently, it produces a correct logical form for around 50% of sentences [13], which 
is not high enough to cover a wide variety of problems. Although the motiva- 
tion behind the current work is to cope with the huge formulas produced by the 
NLP module, we instead used a library of manually formalized problems for the 
evaluation of the formula simplification procedure. 

The problem library has been developed along with the TOROBOMATH sys- 
tem. It contains approximately one thousand math problems collected from 
several sources including the International Mathematical Olympiads (IMOs). 
Figure 3 presents a problem that was taken from IMO 2012. 

The problems in the library are manually encoded in a polymorphic higher- 
order language, which is the same language as the output of the NLP module. 
Table 1 lists some of its primitive types. The language includes a large set of 
predicate and function symbols that are tailored for formalizing pre-university 
math problems. Currently, 1387 symbols are defined using 2808 axioms. Figure 4 
provides an example of the axioms that defines the predicate maximum. 
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Table 1. Example of Primitive Types (axiom def_maximum 
(forall (set max) 
(<-> (maximum set max) 
(& (elem max set) 
(forall (v) 


truth values Bool 


numbers Z (integers), Q (rationals), 


R (reals), C (complex) 


vectors 2d.Vec, 3d.Vec 

geometric objects | 2d.Shape, 3d.Shape (-> (elem v set) 
angles 2d.Angle, 3d. Angle (<= v max))))))) 
sets and lists Set0f (aœ), List0f(a) 


Fig. 4. Example of Axiom 


The problem solving module of the TOROBOMATH accepts a formalized 
problem and iteratively rewrites it using: (1) basic transformations such as 
Vu.a = a — ọ(x)) = (a) and beta-reduction, (2) simplification of expres- 
sions such as polynomial division and integration by computer algebra systems 
(CASs), and (3) the axioms that define the predicate and function symbols. 

Once the rewritten formula is in the language of real-closed fields (RCFs) 
or Peano arithmetic, it is handed to a solver for the theory. For RCF formu- 
las, we use an implementation of the quantifier-elimination (QE) procedure for 
RCF based on cylindrical algebraic decomposition. Finally, we solve the resulting 
quantifier-free formula with CASs and obtain the answer. The time complexity 
of RCF-QE is quite high; it is doubly exponential in the number of variables [5]. 
Hence, the simplification of the formula before RCF-QE is a crucial step. 


3 Algebraically Indexed Types 


This section summarizes the framework of AIT. We refrain from presenting it in 
full generality and describe its application to geometry ([4, §2]) with the restric- 
tion we made on it in incorporating it into the type system of TOROBOMATH. 

In AIT, some of the primitive types have associated indices. An index rep- 
resents a transformation on the object of that type. For instance, in Vec(B,t), 
the index B stands for an invertible linear transformation and t stands for a 
translation. The index variables bound by universal quantifiers signify that a 
function of that type is invariant under any transformations indicated by the 
indices, e.g., 


midpoint : VB:GL.Vt:T2. Vec(B,t) — Vec(B,t) — Vec(B,t). 


The type of midpoint certifies that, when two points P and Q undergo an 
arbitrary affine transformation, the midpoint of P and Q moves accordingly. 


3.1 Sort and Index Expression 


The sort of an index signifies the kind of transformations represented by the 
index. We assume the set SORT of index sorts includes GL; (k = 1, 2,3) (general 
linear transformations), Og (k = 2,3) (orthogonal transformations), and Tẹ (k = 
2,3) (translations). In the type of midpoint, B is of sort GLa and t is of sort T2. 
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An index expression is composed of index variables and index operators. 
In the current paper, we use the following operators: (+,—,0) are addition, 


negation, and unit of Ty (k = 2,3); (- , 71,1) are multiplication, inverse, and unit 
of GL; and Ox; det is the determinant; |-| is the absolute value. An index context 
A is a list of index variables paired with their sorts: A = 11:51, 72:S9,...,%n:Sin- 


The well-sortedness of an index expression e of sort S, written A F e : S, is 
defined analogously to the well-typedness in simple type theory. 


3.2 Type, Term, and Typing Judgement 


The set of primitive types, PRIMTYPE = {Bool,R, 2d. Vec, 3d. Vec, 2d. Shape, 
... }, is the same as that in the language of TOROBOMATH. A function tyArity: 
PRIMTYPE — SORT” specifies the number and sorts of indices appropriate for 
the primitive types: e.g., tyArity(2d.Vec) = (GLa, T2). 

A judgement A F A type means that type A is well-formed and well-indexed 
with respect to an index context A. Here are the derivation rules: 


X € PRIMTYPE tyArity(X) = (Si, Bee g Sm) {A H ej: Syhi<j<m 
At X(e1,...,@m) type 

At Atype AF B type A,i:S + A type 

TYARR —— 

AF A —> B type RRNA A F Wi:S.A type 


TYPRIM 


TYFORALL 


While Atkey et al.’s system is formulated in the style of System F, we allow 
the quantifiers only at the outermost (prenex) position. The restriction permits 
an efficient type reconstruction algorithm analogous to Hindley-Milner’s, while 
being expressive enough to capture the invariance of the pre-defined functions 
in TOROBOMATH and the invariance in the majority of math problems. 

The well-typedness of a term M, written A; "+ M : A, is judged with respect 
to an index context A and a typing context I = xı : Á1,..., 2n : An. A typing 
context is a list of variables with their types. A special context Ips consists of the 
pre-defined symbols and their types, e.g., + : Vs:GL1. R(s) — R(s) > R(s) € Lops. 
We assume Ips is always available in the typing derivation and suppress it in 
a judgement. The typing rules are analogous to those for lambda calculus with 
rank-1 polymorphism except for TYEQ: 


PAE Tiy A;TEM:Vi:S.A ARERI ï Aya AEM B y 
AR NIVINST BS 
A;Tta:A A;T EM: A{ir e} A; bt rAx.M:A->B 


AT M:A>B ATHN:A pp ATEM:A AFA=B 
A;PTEMN:B A;:PEM:B 


TYEQ 


In the ABs and APP rules, the meta-variables A and B only designate a type 
without quantifiers. In the UNIVINST rule, A{i > e} is the result of substituting 
e for i in A. The ‘polymorphism’ of the types with quantifiers hence takes place 
only when a pre-defined symbol (e.g., midpoint) enters a derivation via the VAR 
rule and then the bound index variable is instantiated via the UNIVINST rule. 


Simplification via Invariance Detection 393 


The type equivalence judgement A F A = B in the TYEQ rule equates two 
types involving semantically equivalent index expressions; thus, e.g., s:GLı H 
R(s- s71) =R(1) and O:02 + R(| det O|) = R(1). 


3.3 Index Erasure Semantics and Transformational Interpretation 


The abstraction theorem for AIT [4] enables us to know the invariance of a term 
by its type. The theorem relates two kinds of interpretations of types and terms: 
index erasure semantics and relational interpretations. We will restate the the- 
orem with what we here call transformational interpretations (t-interpretations 
hereafter), instead of the relational interpretations. It suffices for the purpose of 
justifying our algorithm and makes it easier to grasp the idea of the theorem. 

The index-erasure semantics of a primitive type X(e1,...,@n) is determined 
only by X. We thus write |X(e1,...,@n)| = |X]. The interpretation |X] is 
the set of mathematical objects intended for the type: e.g., [2d.Vec(B,t)| = 
|2d.Vec| = R? and |R(s)| = |R| = R. The index-erasure semantics of a non- 
primitive type is determined by the type structure: |A —> B| = [A] — |B] and 
\Vi:S. T] = |T]. 

The index-erasure semantics of a typing context I = x 1:T,,...,2n:Tn is the 
direct product of the domains of the variables: || = [Ti] x --- x |T,]. The 
erasure semantics of a term A; l H M : A is a function of the values assigned to 
its free variables: |M|: |T] — |A] and defined as usual (see, e.g., [18,21]). 

The t-interpretation of a type T, denoted by [T], is a function from the assign- 
ments to the index variables to a transformation on |T|. To be precise, we first 
define the semantics of index context A = 11:51, ...,İn:Sn as the direct product 
of the interpretation of the sorts: [A] = [S1] x --- x [Sn], where [.Si],..., [Sn] 
are the intended sets of transformations: e.g., [GL2] = GL2 and [T2] = T2. The 
interpretation of an index expression e of sort S' is a function [fe] : [A] — [S] 
that is determined by the structure of the expression; for p € [A], 


[£(e1,.--,en)]() = [£] (Leal (e),---, Len} (o)), lirli) = (ie), 


where, in the last equation, we regard p € [A] as a function from index variables 


to their values. The index operations det and |- | are interpreted as intended. 
The tinterpretation of a primitive type X(e1,...,€n) is then determined by 
X and the structures of the index expressions €1,...,€n. The t-interpretation 


of Vec and Shape is the affine transformation of vectors and geometric objects 
parametrized by p € [A]; for index expressions 3:GL2 and 7:Ta, 


[Vec(S, r)](p) : R? > £ = Mygqipyt + virgi) € R? 
[Shape(Z,7)](p) : P(R?) > S= {Moot + vig) |£ E S} € P(R?), 


where Mjgjp) and vjrj(p) are the representation matrix and vector of [6] (p) and 
[t](e), and P(R?) denotes the power set of R?. Similarly, for the real numbers, 


[Ro] (o) : R > a [o] ER. 
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That is, [R(c)](p) is a change of scale with the scaling factor determined by the 
expression o:GL; and the assignment p. For a primitive type X with no indices, 
its t-interpretation is the identity map on |X]: i.e., [X](~) = id, x). 

The t-interpretation of a function type A — B is a higher-order function that 
maps a (mathematical) function f : |A] — |B] to another function on the same 
domain and codomain such that: [A > B](»)(f) = [B](p) ° f o (JA](p)) 77. It is 
easy to check that this interpretation is compatible with currying. Equivalently, 
we may say that if g = [A — B](p)(f), then f and g are in the commutative 
relation g o [A](p) = [B](p) o f. The typing derivation in AIT is a way to ‘pull 
out’ the effect of transformation [A](p) on a free variable deep inside a term by 
combining such commutative relations. 

The t-interpretation of a fully-quantified type is the identity map on its era- 
sure semantics: [Vi1:91....Vin:S,.T] = idj7|. We don’t define that of partially- 
quantified types because we don’t need it to state the abstraction theorem. 


3.4 Abstraction Theorem 


The abstraction theorem for AIT enables us to detect the invariance of (the 
erasure-semantics of) a term under a certain set of transformations on its free 
variables. We first define the t-interpretation of the typing context l = xı : 
T1,..-,;Un : Tn as a simultaneous transformation of 7 = (v1,...,Un) E [T]: 


[FIC : LJ ne II) on = (IJ (p) o v,- [Tn] (2) oun) € LE. 
We now present a version of the abstraction theorem, restricted to the case of a 
term of quantifier-free type and restated with the t-interpretation: 
Theorem 1 (Abstraction [4], restated using transformational interpretation). 
If A is a quantifier-free type and A; + M : A, then for all p € [A] and all 
nE |T], we have [A](p) ° |M] (n) = [M] (FIC) on). 

Here we provide two easy corollaries of the theorem. The first one is utilized 
to eliminate variables from a formula while preserving the equivalence. 
Corollary 1. If A; xı : Ti,...,%n : Tn F O(a1,...,%n) : Bool, then for all 
p E€ [A], we have ¢(z1,..., En) + ATi] (P) © £1,- , [Tn] (P) © £n). 

This is by the abstraction theorem and the fact [Bool](p) = id|go01] for any p. 
It indicates that, without loss of generality, we may ‘fix’ some of the variables 
to, e.g., zeros by appropriately choosing p. 

The second corollary is for providing more intuition about the theorem. 
Corollary 2. [feeb Azı. ....AUn. f(@1,.--,%n) : VA. Ti > —> Ty > To 
then, for all p € [A] and all v; € [Ti] (i= 1,...,n), 


[To] (P) o LF] (v1; -- -> Va) = LFI (ITiN(2) © v1, --- [En] Co) © vn). 


In the statement, VA signifies the universal quantification over all index variables 
in A. By this corollary, for instance, we can tell from the type of midpoint that, 
for all z1, £2 € R? and for all g € GL and t € To, 


|midpoint]| (M,x1 + vt, M,v2+ vt) = Mg |midpoint| (x1, £2) + v. 
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3.5 Restriction on the Index Expressions of Sort GL; /O, (k > 2) 


We found that the type reconstruction in AIT is far more straightforward when 
we assume an index expression of sort GL, or O, (k > 2) includes at most 
one index variable of sort GL or O, that is not in the determinant operator. 
Assuming this, any expression e of sort GL; or Ox can be written in the form of 


e= [Js -[] lsd® - [] cet)” - T] |det(B;)1% - B8, 


ie ie jeJ jeJ 


where {s; jiez are of sort GL1, {Bp}U{B;} je, are of sort GL; or Ok, Wi, Zi, Yj, Zj E 
Z, and 6 € {0,1}. We henceforth say an expression e in the above form satisfies 
the head variable property and call Bo the head variable of e. 

Empirically, this restriction is not too restrictive; as far as we are aware of, 
the invariance of all the pre-defined functions and predicates in TOROBOMATH 
is expressible with an indexed-type satisfying this. 


4 Invariance Detection Through Type Reconstruction 


We need type reconstruction in AIT for two purposes: to infer the invariance of 
the pre-defined symbols in TOROBOMATH and to infer the invariance in a math 
problem. To this end, we only have to derive the judgement A; I F ¢ : Bool 
where œ is either a defining axiom of a symbol or a formula of a problem. For 
a pre-defined symbol s, by a judgement A;s : T,--- F @ : Bool, we know s 
is of type T and it has the invariance signified by T. For a problem ¢, by the 
judgement A; xı : Ti,..., £n : Tn F @: Bool, we know the invariance of ¢ under 
the transformation on the free variables x1,..., £n according to [J],..., [Th]. 

Since all types are in prenex form, we can find the typing derivation by 
a procedure analogous to the Hindley-Milner (H-M) algorithm. It consists of 
two steps: deriving equations among index expressions, and solving them. The 
procedure for solving the equations in T2/T3 is essentially the same as in the 
type inference for Kennedy’s unit-of-measure types [11], which is a precursor of 
AIT. Further development is required to solve the equations in GL2/GL3, even 
under the restriction on the form of index expressions mentioned in Sect. 3.5, 
due to the existence of the index operations | - | and det. 


4.1 Equation Derivation 


We first assign a type variable a; for each subterm t; in ¢. Then, for a subterm t; 
in the form t,t, (i.e., application of t; to tg), we have the equation a; = ag > aj. 
The case for a subterm t; in the form of Ax.t; is also analogous to H-M and we 
omit it here. For a leaf term (i.e., a variable) ¢;, if it is one of the pre-defined 
symbols and t; : Viz:51....ViniSn.T E Tops, we set a; = T{t1 œ> f1,...,in => 
Bn}, where {i1 +> 61,...,%n + Bn} stands for the substitution of fresh variables 
B1,- --, Bn for 11,...,%n. By solving the equations for the type and index variables 
{a;i} and {8;}, we reconstruct the most general indexed-types of all the subterms. 
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For example, consider the following axiom defining perpendicular: 
Vu1.Vvg.(perpendicular(v1, v2) — inner-prod(v1, v2) = 0), 


and suppose that inner-prod is in Ips. We are going to reconstruct the type 
of perpendicular. The type of inner-prod is 


inner-prod : Ys1, 82:GL,. VO:02. Vec(s;O,0) — Vec(s2O, 0) — R(s1 - 52) 


and it is instantiated as inner-prod : Vec(s10,0) — Vec(s20,0) — R(s1 - s2) 
where $1, $2, and O are fresh variables. Since the type of perpendicular in the 
non-AIT version of our language is Vec — Vec — Bool, we set fresh variables to 
all indices in the primitive types and have: 


perpendicular : Vec(61, T1) — Vec((32, 72) — Bool. 


Since perpendicular is applied to vı and v2, the types of vı and v2 are 
equated to Vec(,,71) and Vec((2,72). Additionally, since inner-prod is also 
applied to vı and v2, we have the following equations: 


Vec(s,O,0) = Vec((31,71), Vec(s20,0) = Vec( 82, T2) (4.1) 


If we have an equation between the same primitive type, by unifying both sides of 
the equation, in turn we have one or more equations between index expressions, 
i.e., if we have X(e1,...,€m) = X(eļ,-.., €i), then we have: e1 = el, ..., €m = 
e. For Eq. (4.1), we hence have sO = (1,820 = (2,0 = %, and 0 = 7». 
Thus, by recursively unifying all the equated types, we are left with a system of 
equations between index expressions. 


4.2 Equation Solving 


To solve the derived equations between index expressions, we need to depart 
from the analogy with the H-M algorithm. Namely, instead of applying syn- 
tactic unification, we need semantic unification, i.e., we solve the equations as 
simultaneous equations in the transformation groups. 

We first order the equations with respect to the sort of the equated expres- 
sions. We then process them in the order T2/T3 — GL2/GL3 — GL; as follows.! 

First, since equations of sort T2/T3 are always in the form of >), aiti = 
0 (a; € Z), where {t;} are variables of sort Tẹ (k € {2,3}), we can solve the 
equations as is the case with a linear homogeneous system. Although the solution 
may involve rational coefficients as in t; = yy m tj (Nij, Mij € Z), we can clear 
the denominators by introducing new variables t} such that t; = lem{mi;};- t}. 

Next, by the head variable property, equations of sort GL2/GL3 (henceforth 
GL>2) are always in the form of 0,B, = o2B2, where c and oz are index 


1 In this subsection, GL2, GL3, O2, and O3 are collectively denoted as GL2/GL3 or GL>2. 
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expressions of sort GL, and Bı and Bz are the head variables of sort GL>2. We 
decompose these equations according to Table 2, which summarizes the following 
argument: Let E donate the identity transformation. Since 0,B, = 02By <> 
oi los E = BiB , there must be some s € GL; such that Bı By | = sE and 
ci Tlo = 8. PERNO, by the superset-subset relation between the sorts of 
Bı and B2, e.g., O2 C GLo for Bı : O2 and Bə : GL2, we can express one of the 
broader sort with the other as a parameter. 

The algorithm for GL>z equations works as follows. First, we initialize the 
set of solution with the empty substitution: S — {}. For each GL>2 equation 
0, By = o2B2, we look up Table 2 and find the GL>2 solution B; +> sB; and one 
or more new GL; equations. We populate the current set of GL; equations with 
the new ones, and apply the solution B; +> sB; to all the remaining GL; and 
GLs» equations. We also compose the GL» solution B; + sB; with the current 
solution set: S — So {B; + sB;}. 

By processing all GL>2 equations as above, we are left with a partial solution 
S and a system of GL, equations, each of which is in the following form: 


[s ™ [| det(B;)” - || |det(B;)|7 =1 (wi, £i; yj, zj € Z), 


icI icI jEJ jEJ 


where we assume about J and J that {s;}iez are all the GL, variables, {B}; }jes 
are all the remaining GL>ə2 variables, and I N J = 9. Letting u; = s; - |s:|7}, 
vi = |sil, uj = det(B;) - |det(B;)|71, and v; = |det(B;)|, we have s; = u;v; and 
det(B;) = ujv; for alli € J and j € J. By using them, we have 


Wi, witTi | Yi, Yitžj _ 
Mee r Mer =: 
a i j j 


Since u;, uj € {+1, —1} and v;, vj > 0 for all 7 and j, we know the above equation 
is equivalent to the following two equations: 


A r +Z; 
peesi pept: 
t J 


We thus have two systems of equations, one in {+1,—1} and the other in Ryo. 
Now we temporarily rewrite the solution with u; and vj: S — Sof{s; > ujujpier. 

First consider the system in Ry». As long as there remains an equation 
involving a variable v;, which originates from a GL, variable, we solve it for v; 
and compose the solution v; — [],, ts usr IL v with S while applying it to the 
remaining equations. The denominators of fractional exponents (i.e., pi,q; € 
Q\ Z) can be cleared similarly to the case of Tọ equations. If all the equations in 
Rso are solved this way, then S is the most general solution. Otherwise, there 
remain one or more equations of the form [J ,< y | det B;|% = 1 for some J’ C J 
and {d;}j;ej. This is the only case where we may miss some invariance of a 
formula; in general, we cannot express the most general solution to this equation 
only with the index variables of sort GL, and Op. We make a compromise here 
and are satisfied with a less general solution So{B,; œ> E}j¢,. Fortunately, this 
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Table 2. Decomposition of GL2/GL3 equation o;B; = 0; B; (s: a fresh variable) 


Combination of head variables Solution in GL, Equations in GL, 


B; = B; none Oi = 0; 
Bi:On A By = Bi = sE soi =j, |s|=1 
Bi : GL A Bj =E Bir sE 80; = 0; 
Bi :Ok A Bj: Ok Bi sB; soi =0;, |s| =1 
Bi : GL A Bj: Ok Bi sB; 80; = 0; 
Bi :GLk A Bj :Gle Bi, sB; Soi = Oj 


does not frequently happen in practice. We made this compromise only on three 
out of 533 problems used in the experiment. We expect that having more sorts, 
e.g., SLE = {M_E GL; | |det M| = 1}, in the language of index expressions 
might be of help here, but leave it as a future work. 

The system in {+1, —1} is processed analogously to that in Ryo. Finally, by 
restoring {ui,vi}ier and {u,;,v;}je7 in the solution S to their original forms, 
e.g., Uj; + s; |s;|7}, we have a solution to the initial set of equations in terms 
of the variables of sort GL; and Ox. 


4.3 Type Reconstruction for Pre-defined Symbols with Axioms 


We incrementally determined the indexed-types of the pre-defined symbols 
according to the hierarchy of their definitions. We first constructed a directed 
acyclic graph wherein the nodes are the pre-defined symbols and the edges repre- 
sent the dependency between their definitions. We manually assigned an indexed- 
type to the symbols without defining axioms (e.g., + : R — R — R) and initialized 
Tops with them. We then reconstructed the indexed-types of other symbols in 
a topological order of the graph. After the reconstruction of the type of each 
symbol, we added the symbol with its inferred type to Tops- 

For some of the symbols, type reconstruction does not go as well as we hope. 
For example, the following axiom defines the symbol midpoint: 


1 
Vp1, p2-(midpoint (pi, p2) = ae (pı + p2)). 


At the beginning of the type reconstruction of midpoint, the types of the symbols 
in the axiom are instantiated as follows: 


midpoint : Vec(31,71) — Vec((2, T2) — Vec 63, T3) 
-:R(s1) > Vec(B1,0)} — Vec(sı B1, 0) 
+: Vec(Bo,t1) — Vec(Ba, ta) > Vec(Bo,ti + te). 


The derived equations between the index expressions are as follows: 


{B2 = bı, B2 = b2, Bi = Bo, 03 = sı Bi, $1 =1,t1 = 1, te = 72,0 = tı + t2, T3 = 0}. 
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By solving these equations, we obtain the indexed-type of midpoint as follows: 
midpoint : VB,:GLo. Vti:T2. Vec(By, t1) —> Vec(B,, —t1) — Vec/( Bı, 0). 


This type indicates that the midpoint of any two points P and Q remains the 
same when we move P and Q respectively to P+ tı and Q — tı for any tı € R?. 
While it is not wrong, the following type is more useful for our purpose: 


midpoint : VB:GL2. Vt:T2. Vec(B,t) — Vec(B,t) — Vec(B,t). (1) 


To such symbols, we manually assigned a more appropriate type.” 

In the current system, 945 symbols have a type that includes indices. We 
manually assigned the types to 255 symbols that have no defining axioms. For 203 
symbols we manually overwrote the inferred type as in the case of midpoint. The 
types of the remaining 487 symbols were derived through the type reconstruction. 


5 Variable Elimination Based on Invariance 


In this section, we first provide an example of the variable elimination procedure 
based on invariance. We then describe the top-level algorithm of the variable 
elimination, which takes a formula as input and eliminates some of the quantified 
variables in it by utilizing the invariance indicated by an index variable. We 
finally list the elimination rule for each sort of index variable. 


5.1 Example of Variable Elimination Based on Invariance 


Let us consider again the proof of the existence of the centroid of a triangle. For 
triangle ABC, the configuration of the midpoints P, Q, R of the three sides and 
the centroid G is described by the following formula: 


P =midpoint(B,C) A on(G, segment(A, P)) 
W(A, B,C, P,Q, R,G) := | Q =midpoint(C, A) A on(G, segment (B, Q)) 
R = midpoint(A, B) A on(G, segment(C, R)) 


A 
A 


where on( X,Y) stands for the inclusion of point X in a geometric object Y, 
and segment(X,Y) stands for the line segment between points X and Y. Let ¢ 
denote the existence of the centroid (and the three midpoints): 


(A, B,C) := IG. AP. IQ. IR. Y(A, B,C, P,Q, R, G). 
Our goal is to prove VA. VB. YC. ọ¢(A, B,C). 


2 The awkwardness of the type inferred for midpoint is a price for the efficiency of 
type reconstruction; it is due to the fact that we ignore the linear space structure of 
T2 (and also, we do not posit Tı(œ R) as the second index of type R). Otherwise, 
the type reconstruction comes closer to a search for an invariance on the algebraic 
representation of the problems and the defining axioms. Hence 1/2 * (t + t) = t is 
not deduced for t : T2, which is necessary to infer the type in Eq. (1). 
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The functions midpoint, on, and segment are invariant under translations 
and general linear transformations. The reconstruction algorithm hence derives 


B: GL2,T : T2 ; A: Vec(G,7r), B : Vec(3,7),C : Vec(8, T) F (A, B,C) : Bool. 


By the abstraction theorem, this judgement implies the invariance of the 
proposition ¢(A, B,C) under arbitrary affine transformations: 


Vg € GLa. Vt € T2. VA, B,C. o(A, B,C) & d(togoA,togoB,togoC). 
First, by considering the case of g being identity, we have 
Vt € To. VA, B,C. o(A, B,C) & (to A,to B,to C). (2) 


By using this, we are going to verify VB,C. ¢(0, B,C) = VA, B,C. o(A, B,C), 
by which we know that we only have to prove YB, C. ¢(0, B,C). 

Suppose that VB,C. (0, B,C) holds. Since Tə acts transitively on R?, for 
any A € R?, there exists t € Tz such that to 0 = A. Furthermore, for any 
B,C € R’, by instantiating VB, C. 6(0, B,C) with B — t-toBandC Ht 100, 
we have ¢(0,t~'o B,t~ oC). By Eq. (2), we obtain (to0,tot-!oB,tot-!oC), 
which is equivalent to ¢(A, B,C). Since A, B,C were arbitrary, we proved 


VB, C. ¢(0, B,C) > VA, B,C. $(A, B,C). 


The converse is trivial. We thus proved VB,C. $¢(0,B,C) <= VA,B, 
C. o(A, B,C). 

The simplified formula, YB, C. ¢(0, B,C), is still invariant under the simul- 
taneous action of GL on B and C. Hence, by applying the type reconstruction 
again, we have 3 : Gly ; B : Vec(G,0),C : Vec(3,0) F ¢(0,B,C) : Bool. It 
implies the following invariance: Vg € GLa. VB, C. 6(0, B,C) = (0, goB, goC). 

We now utilize it to eliminate the remaining variables B and C. Although it 
is tempting to ‘fix’ B and C respectively at, e.g., e1 := (1,0) and e2 := (0,1), it 
incurs some loss of generality. For instance, when B is at the origin, there is no 
way to move B to e by any g € GLə2. We consider four cases: 


1. B and C are linearly independent, 

2. B #0, and B and C are linearly dependent, 

3. C #0, and B and C are linearly dependent, and 
4. B and C are both at the origin. 


For each of these cases, we can find a suitable transformation in GL as follows: 


1. There exists gı E€ GLg s.t. gı o B =e, and gı oC = e2, 

2. There exist gg E€ GLa and r € R s.t. g2 o B = e and g2 0 C = rej, 

3. There exist g3 € GLə and r’ € R s.t. g3 o C = e, and g3 o B = r'e1, and 
4. We only have to know whether or not ¢(0,0,0) holds. 


By a similar argument to the one for the translation-invariance, we have 
VB, C. (0, B, C) = o(0, €l, €2)AVr. o(0, €l, rei) AY”. o(0, ey, e1) A@¢(0, 0, 0). 


Thus, we eliminated all four coordinate values (i.e., x and y coordinates for B 
and C) in the first and the last case and three of them in the other two cases. 
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5.2 Variable Elimination Algorithm 


The variable elimination algorithm works as follows. We traverse the formula of 
a problem in a top-down order and, for each subformula in the form of 


Qa1.Qr2.-++ Qin. (T1, £2,---,2n,y) (Q € {V,F}) 


where y = ¥1,---,Ym are the free variables, we apply the type reconstruction 
procedure to (£1, £2, ..., £n, y) and derive a judgement A; T, x1:T1,..., £n:Tn F 
$(@1,---,2n,y) : Bool. We then choose an index variable i that appears at least 
once in T,,...,T, but in none of the types of y. It means the transformation 
signified by ¿i acts on some of {£1,..., £n} but on none of y. We select from 
{x1,...,2,} one or more variables whose types include i and are of the form R(c) 
or Vec(3,7). Suppose that we select 21,...,2;. Then we know the judgement 
A; T, x1:T1,... £: Ti  Qaizi.:+-Qan. 6(@1,.--,%n,y) : Bool also holds. We 
then eliminate (or add restriction on) the bound variables 71,...,2,; by one of 
the lemmas in Sect. 5.3 according to the sort of i. After the elimination, the 
procedure is recursively applied to the resulting formula and its subformulas. 


5.3 Variable Elimination Rules 
We now present how to eliminate variables based on a judgement of the form 
A; T, z1 :T1,..., £n : In F Y(T.. ., 2n, y) : Bool 


where T1,...,Tn include no other variables than i; l = y1:U1,...,Ym:Um is a 
typing context for y = y1,..-,Ym; and U,...,U,,, do not include 7. Note that 
we can obtain a judgement of this form by the procedure in Sect. 5.2 and by 
substituting the unity of appropriate sorts for all index variables other than 7 in 
Tese Tees 

We provide the variable elimination rules as lemmas, one for each sort of i. 
They state the rules for variables bound by V. The rules for 3 are analogous. 
In stating the lemma, we suppress A and I in the judgement and y in w for 
brevity but we still assume the above-mentioned condition hold. 

Some complication arises due to the fact that if k # l, then Tẹ and T; may 
be indexed with different expressions of i. We thus need to consider poten- 
tially different transformations [Ti](2),...,]Tn](¢) applied simultaneously on 
£1, .--, Zn. Please refer to supplementary material on the first author’s web page 
for a general argument behind the rules and the proofs of the lemmas (https:// 
researchmap.jp/mtzk/?lang=en). 


T: The following lemma states that, as we saw in Sect. 5.1, we have only to 
consider the truth of a formula y(x) at x = 0 if y(x) is translation-invariant. 
Lemma 1. /f x : Vec(1,7(t)) F y(x) : Bool holds for t : Ty (t € {2,3}), then 
Va. y(x) = (0). 


O2: The following lemma means that we may assume z is on the z-axis if w(x) 
is invariant under rotation and reflection. 
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Lemma 2. If x : Vec(3(O),0) F y(x) : Bool holds for O : O2, then Va. y(x) & 
Vr. w(rey). 


O3: A judgement in the following form implies different kinds of invariance 
according to 3; and (33: 


zı : Vec(G1(O),0), £2 : Vec(39(O), 0) F Y(z1, £2) : Bool. (3) 
In any case, we may assume xı is on the z-axis and £2 is on the xry-plane for 
proving V1, £2. (21,22), as stated in the following lemma. 


Lemma 3. If judgement (3) holds for O : O3, then 


Vaz. Vato. (21,02) © Vp, q,r € R. Y(pe1,qe1 + ree). 


GL: For s : GLi, a judgement x : R(o(s)) F w(x) : Bool implies, either 


— (x) is invariant under change of sign, i.e., w(x)  y(—zx), 
— w(a) is invariant under positive scaling, i.e., Y(x)  y(fx) for all f > 0, or 
— (x) is invariant under arbitrary scaling, i.e., Y(x) = (fx) for all f 40. 


The form of ø determines the type of invariance. The following lemma summa- 
rizes how we can eliminate or restrict a variable for these cases. 


Lemma 4. Let o(s) = s°-|s|\f (e 4 0 or f 4 0) and suppose a judgement 
x : R(o(s)) F (s) : Bool holds for s : GLı. We have three cases: 


1. ife+ f =0, then Va. w(x) & Va > 0. y(x), otherwise, 
2. ife is an even number, then Vx. w(x) & W(1) A y(0) A w(-1), and 
3. ife is an odd number, then Vax. y(x) = (1) A y(0). 


GL2 For B : GLə2, a judgement in the following form implies different kinds of 
invariance of w(21, 22) depending on the form of (3, and £2: 


zı : Vec(1(B),0), £2 : Vec(Go(B),0) F U(a1, 22). (4) 


The following lemma summarizes how we eliminate the variables in each case. 


Lemma 5. Let 3;(B) = det(B)* -|det(B)|f7-B and gj = e;+f; (j € {1,2}). If 
judgement (4) holds, then, letting Wo := W(0,0) A Yr. w(re1,e1) A Yr. w(e1, rer) 
and W := Va1. Vxo. Y(x1, £2), the following equivalences hold: 


1. If gı +g2+1=0 and 

— if ey + eg is an even number, then V & po A W(e1, €2) 

— if ey + €2 is an odd number, then Y & wo A W(e1, €2) A w(e1, —€2) 
2. Tf gı g2 1 0, then VW & Wo A Yr. w(rey, e2). 


A similar lemma holds for the invariances indicated by an index variable of sort 
GL3. We refrain from presenting it for space reasons. 
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Table 3. Results on All RCF Problems Table 4. Results on RCF Problems with 
Invariance Detected and Variable Elimi- 


in TOROBOMATH Benchmark 


nated 
Division/#Prblms ALGIDX BASELINE Division/#Prblms ALGIDx BASELINE Speed 
Solved Time Solved Time Solved Time Solved Time up 
IMO 116 28% 51.7s 16% 19.7s IMO 77 19% 91.3s 1% 3.6s 23% 
Univ 243 69% 22.18 62% 26.88 Univ 49 57%  31.0s 33% 62.78 495% 
Chart 174 68% 9.78 62% 12.08 Chart 77 49% 14.38 36%  26.0s 529% 
All 533 60% 20.58 52% 20.68 All 203 40% 34.38 22% 38.58 505% 
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Fig. 5. Comparison of Elapsed Time with and without the Invariance Detection based 
on AITs (Left: All Problems; Right: Problems Solved within 60s) 


6 Experiment 


We evaluated the effectiveness of the proposed method on the pre-university 
math problems in the TOROBOMATH benchmark. We used a subset of the prob- 
lems that can be naturally expressible (by human) in the language of RCF. 
Most of them are either in geometry or algebra. Note that the formalization was 
done in the language introduced in Sect. 2 but not directly in the language of 
RCF. The problems are divided according to the source of the problems; IMO 
problems were taken from past International Mathematical Olympiads, Univ 
problems were from entrance exams of Japanese universities, and Chart prob- 
lems were from a popular math practice book series. Please refer to another 
paper [16] on the TOROBOMATH benchmark for the details of the problems. 

The type reconstruction and formula simplification procedures presented in 
Sect. 4 and Sect. 5 were implemented as a pre-processor of the formalized prob- 
lems. The time spent for the preprocessing was almost negligible (0.76s per 
problem on average) compared to that for solving the problems. 

We compared the TOROBOMATH system with and without the pre-processor 
(respectively called ALGIDX and BASELINE below). The BASELINE system is 
equipped with Iwane and Anai’s invariance detection and simplification algorithm 
[10] that operates on the language of RCF while ALGIDX is not with it. Thus, our 
evaluation shall reveal the advantage of detecting and exploiting the invariance of 
the problem expressed in a language that directly encodes its geometric meaning. 
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Table 5. Percentage of Problems from Table 6. Most Frequent Invariance Types 
which one or more Variables are Elimi- Detected and Eliminated 
nated by the Rule for each Sort 


Invariance (%) Invariance (%) 

GLı T2 O2 GL2 T3 O3 GL3 any GLi, T2, O2 17.1 GLi 2.4 
22.3 27.4 26.1 1.7 6.6 7.5 0.0 38.1 T2, O2 81 Gli, T3, O3 2.1 
Tz, Os 45 Ta Gl 1i 


Table3 presents the results on all problems. The solver was run on each 
problem with a time limit of 600s. The table lists the number of problems, the 
percentages of the problems solved within the time limit, and the average wall- 
clock time spent on the solved problems. The number of the solved problems is 
significantly increased in the IMO division. A modest improvement is observed 
in the other two divisions. Table 4 presents the results only on the problems in 
which at least one variable was eliminated by ALGIDx. The effect of the proposed 
method is quite clearly observed across all problem divisions and especially on 
IMO. On IMO, the average elapsed time on the problems solved by ALGIDX is 
longer than that by BASELINE; it is because more difficult problems were solved 
by ALGIDx within the time limit. In fact, the average speed-up by ALGIDx (last 
column in Table4) is around 500% on Univ and Chart; i.e., on the problems 
solved by both, ALGIDX output the answer five times faster than BASELINE. 

A curious fact is that both ALGIDx and BASELINE tended to need more time 
to solve the problems on which an invariance was detected and eliminated by 
ALGIDX (i.e., Time in Table 4) than the average over all solved problems (Time 
in Table 3). It suggests that a problem having an invariance, or equivalently a 
symmetry, is harder for automatic solvers than those without it. 

Figure5 shows a comparison of the elapsed time for each problem. Each 
point represents a problem, and the x and y coordinates respectively indicate 
the elapsed time to solve (or to timeout) by BASELINE and ALGIDx. We can 
see many problems that were not solved by BASELINE within 600s were solved 
within 300s by ALGIDX. The speed-up is also observed on easier problems (those 
solved in 60s) as can be seen in the right panel of Fig. 5. 

Table 5 lists the fraction of problems on which one or more variables are 
eliminated based on the invariance indicated by an index variable of each sort. 
Table6 provides the distribution of the combination of the sorts of invariances 
detected and eliminated by ALGIDX. 


7 Conclusion 


A method for automating w.l.o.g. arguments on geometry problems has been 
presented. It detects an invariance in a problem through type reconstruction in 
AIT and simplifies the problem utilizing the invariance. It was especially effective 
on harder problems including past IMO problems. Our future work includes the 
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exploration for a more elaborate language of the index expressions that captures 
various kind of invariance while keeping the type inference amenable. 
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Abstract. We discuss the results of our work on heuristics for gen- 
erating minimal synthetic tableaux. We present this proof method for 
classical propositional logic and its implementation in Haskell. Based on 
mathematical insights and exploratory data analysis we define heuris- 
tics that allows building a tableau of optimal or nearly optimal size. 
The proposed heuristics has been first tested on a data set with over 
200,000 short formulas (length 12), then on 900 formulas of length 23. 
We describe the results of data analysis and examine some tendencies. 
We also confront our approach with the pigeonhole principle. 


Keywords: Synthetic tableau - Minimal tableau - Data analysis - 
Proof-search heuristics - Haskell - Pigeonhole principle 


1 Introduction 


The method of synthetic tableaux (ST, for short) is a proof method based entirely 
on direct reasoning but yet designed in a tableau format. The basic idea is that 
all the laws of logic, and only laws of logic, can be derived directly by cases 
from parts of some partition of the whole logical space. Hence an ST-proof 
of a formula typically starts with a division between ‘p-cases’ and ‘—p-cases’ 
and continues with further divisions, if necessary. Further process of derivation 
consists in applying the so-called synthesizing rules that build complex formulas 
from their parts—subformulas and/or their negations. For example, if p holds, 
then every implication with p in the succedent holds, ‘q — p’ in particular; then 
also ‘p — (q — p)’ holds by the same argument. If ~p is the case, then every 
implication with p in the antecedent holds, thus ‘p —> (q — p)’ is settled. This 
kind of reasoning proves that ‘p — (q — p)’ holds in every possible case (unless 
we reject tertium non datur in the partition of the logical space). There are 
no indirect assumptions, no reductio ad absurdum, no assumptions that need to 
be discharged. The ST method needs no labels, no derivation of a normal form 
(clausal form) is required. 


This work was supported financially by National Science Centre, Poland, grant no 
2017/26/E/HS1/00127. 
© The Author(s) 2022 
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In the case of Classical Propositional Logic (CPL, for short) the method may 
be viewed as a formalization of the truth-tables method. The assumption that 
p amounts to considering all Boolean valuations that make p true; considering 
ap exhausts the logical space. The number of cases to be considered corresponds 
to the number of branches of an ST, and it clearly depends on the number of 
distinct propositional variables in a formula, thus the upper bound for complexity 
of an ST-search is the complexity of the truth-tables method. In the worst case 
this is exponential with respect to the number of variables, but for some classes 
of formulas truth-tables behave better than standard analytic tableaux (see [4-7] 
for this diagnosis). However, the method of ST can perform better than truth- 
tables, as shown by the example of ‘p — (q — p)’, where we do not need to 
partition the space of valuations against the q/~q cases. The question, obviously, 
is how much better? The considerations presented in this paper aim at developing 
a quasi-experimental framework for answering it. 

The ST method was introduced in [19], then extended to some non-classical 
logics in [20,22]. An adjustment to the first-order level was presented in [14]. 
There were also interesting applications of the method in the domain of abduc- 
tion: [12,13]. On the propositional level, the ST method is both a proof- and 
model-checking method, which means that one can examine satisfiability of a 
formula A (equivalently, validity of ~A) and its falsifiability (equivalently, incon- 
sistency of ~A) at the same time. Normally, one needs to derive a clausal form of 
both A and ~A to check the two dual semantic cases (satisfiability and validity) 
with one of the quick methods, while the ST-system is designed to examine both 
of them. Wisely used, this property can contribute to limiting the increase in 
complexity in verification of semantic properties. 

For the purpose of optimization of the ST method we created a heuristics that 
leads to construction of a variable ordering—a task similar to the one performed 
in research on Ordered Binary Decision Diagrams (OBDDs), and, generally, in 
Boolean satisfiability problem (SAT) [8,15]. In Sect.3 we sketch a comparison 
of STs to OBDDs. Let us stress at this point, however, that the aim of our anal- 
ysis remains proof-theoretical—the ST method is a ‘full-blooded’ proof method 
working on formulas of arbitrary representation. It was already adjusted to first- 
order and to some non-classical logics, and has a large scope of applications 
beyond satisfiability checking of clausal forms. 

The optimization methods that we present are based on exploratory data 
analysis performed on millions of tableaux. Some aspects of the analysis are also 
discussed in the paper. The data are available on https: //ddsuam.wordpress. 
com/software-and-data/. 

Here is a plan of what follows. The next section introduces the ST method, 
Sect. 3 compares STs with analytic tableaux and with BDDs, and Sect. 4 presents 
the implementation in Haskell. In Sect. 5 we introduce the mathematical concepts 


1 On a side note, it is easy to show that the ST system is polynomially equivalent to 
system KE introduced in [4], as both systems contain cut. What is more, there is 
a strict analogy between the ST method and the inverse method (see [4,16]). The 
relation between ST and KI was examined by us in detail in Sect. 2 of [14]. 
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needed to analyse heuristics of small tableaux generation. In Sect. 6 we describe 
the analysed data, and in Sect. 7—the obtained results. Section 8 confronts our 
approach with the pigeonhole principle, and Sect.9 indicates plans for further 
research. 


2 The Method of Synthetic Tableaux 


Language. Let Lcp stand for the language of CPL with negation, =, and impli- 
cation, >. Var = {p,q,r,..-,Pi,---} is the set of propositional variables and 
‘Form’ stands for the set of all formulas of the language, where the notion of 
formula is understood in a standard way. A,B,C... will be used for formulas 
of LcpL. Propositional variables and their negations are called literals. Length 
of a formula A is understood as the number of occurrences of characters in A, 
parentheses excluded. 

Let A € Form. We define the notion of a component of A as follows. (i) A is a 
component of A. (ii) If A is of the form ‘=—B’, then B is a component of A. (iii) If 
A is of the form ‘B — C’, then ‘4B’ and C are components of A. (iv) If A is of the 
form ‘=(B — C)’, then B and ‘=C’ are components of A. (v) If C is a component 
of B and B is a component of A, then C is a component of A. (vi) Nothing else 
is a component of A. By ‘Comp(A)’ we mean the set of all components of A. 
For example, Comp( p > (q > p) ) = {p > (4 > p), =p,q > p, =q, p}. As we 
can see, component of a formula is not the same as subformula of a formula; ~q 
is not a subformula of the law of antecedent, q is, but it is not its component. 
Components refer to uniform notation as defined by Smullyan (see [18]) which 
is very convenient to use with a larger alphabet. Let us also observe that the 
association of Comp(A) with a Hintikka set is quite natural, although Comp(A) 
need not be consistent. In the sequel we shall also use ‘Comp~(A)’ as a short for 
‘Comp(A) U Comp(-A)’. 


Rules. The system of ST consists of the set of rules (see Table 1) and the notion 
of proof (see Definition 2). The rules can be applied in the construction of an ST 
for a formula A on the proviso that (a) the premises already occur on a given 
branch, (b) the conclusion (conclusions, in the case of (cut)) of a particular 
application of the rule belongs (both belong) to Comp~ (A). The only branching 
rule, called (cut) by analogy to its famous sequent-calculus formulation, is at the 
same time the only rule that needs no premises, hence every ST starts with an 
application of this rule. If its application creates branches with p; and —p;, then 
we say that the rule was applied with respect to pi. 

One of the nice properties of this method is that it is easy to keep every 
branch consistent: it is sufficient to restrict the applications of (cut), so that on 
every branch (cut) is applied with respect to a given variable p; at most once. 
This warrants that p;i, ~p; never occur together on the same branch. 

The notion of a proof is formalized by that of a tree. If 7 is a labelled tree, 
then by Xz we mean the set of its nodes, and by ry we mean its root. Moreover, 
nr is used for a function assigning labels to the nodes in X7. 
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Table 1. Rules of the ST system for LcpL 


5) (ze) Ge) (r-) (cut) 
A AN 
aA B AaB A p; a 
A—B A=B «(A > B) AA 


Definition 1 (synthetic tableau). A synthetic tableau for a formula A is 
a finite labelled tree T generated by the above rules, such that nr : X\{rr} — 
Comp~ (A) and each leaf is labelled with A or with ~A. 

T is called consistent if the applications of (cut) are subject to the restriction 
defined above: there are no two applications of (cut) on the same branch with 
respect to the same variable. 

T is called regular provided that literals are introduced in the same order on 
each branch, otherwise T is called irregular. 

Finally, T is called canonical, if, first, it is consistent and regular, and second, 
it starts with an introduction of all possible literals by (cut) and only after that 
the other rules are applied on the created branches. 


In the above definition we have used the notion of literals introduced in the 
same order on each branch. It seems sufficiently intuitive at the moment, so we 
postpone the clarification of this notion until the end of this section. 


Definition 2 (proof in ST system). A synthetic tableau T for a formula 
A is a proof of A in the ST system iff each leaf of T is labelled with A. 


Theorem 1. (soundness and completeness, see [21]). A formula A is valid 
in CPL iff A has a proof in the ST-system. 


Example 1. Below we present two different STs for one formula: B = p > (q > 
p). Each of them is consistent and regular. Also, each of them is a proof of the 
formula in the ST system. 


Ti: Ta: 
1. p 4. ~p l.g T. =q 
2.q—>p 5. p> (q > p) A R 8. q >p 
3. p> (q > p) 2. p 5. ap 9.B 
3.q4—>p 6. B 
4. B 


In 71: 2 comes from 1 by r2,, similarly 3 comes from 2 by r?,. 5 comes from 
4 by rt,. In D: nothing can be derived from 1, hence the application of (cut) 
wrt p is the only possible move. The numbering of the nodes is not part of the 
ST. 
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There are at least two important size measures used with respect to trees: the 
number of nodes and the number of branches. As witnessed by our data, there 
is a very high overall correlation between the two measures, we have thus used 
only one of them—the number of branches—in further analysis. Among various 
STs for the same formula there can be those of smaller, and those of bigger size. 
An ST of a minimal size is called optimal. In the above example, T; is an optimal 
ST for B. Let us also observe that there can be many STs for a formula of the 
same size, in particular, there can be many optimal STs. 


Example 2. Two possible canonical synthetic tableaux for B = p > (q > p). 
Each of them is regular, consistent, but clearly not optimal (cf. Ti). 


T3 Ta: 
q oq q “q Pp “p p “p 
q>p q>p B B q>p B q>p B 
B B B B 


In the case of formulas with at most two distinct variables regularity is a triv- 
ial property. Here comes an example with three variables. 


Example 8. Ts is an irregular ST for formula C = (p ~q) =(r p), 
i.e. variables are introduced in various orders on different branches. 7g is an 
example of an inconsistent ST for C, i.e. there are two applications of (cut) on 
one branch with respect to p, which results in a branch carrying both p and ~p 
(the blue one). The whole right subtree of T5, starting with ~p, is repeated twice 
in J, where it is symbolized with letter T*. Let us observe that —=7(r — p) is 
a component of ~C due to clause (iv) defining the concept of component. 


Ts: Te : 
p ap r—p T: 
rsp pq =alr =p 


=q PpP>q A(r>p) rap q =q 
~p >=) =C C —-(r—>p) ~q p> 
C aC ~p —> =q) =C 
C 


On the level of CPL we can use only consistent STs while still having a com- 
plete calculus (for details see [19,21]). An analogue of closing a branch of an 
analytic tableau for formula A is, in the case of an ST, ending a branch with 
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A synthesized. And the fact that an ST for A has a consistent branch ending 
with =A witnesses satisfiability of ~A. The situation concerning consistency of 
branches is slightly different, however, in the formalization of first-order logic 
presented in [14], as a restriction of the calculus to consistent STs produces an 
incomplete formalization. 

Finally, let us introduce some auxiliary terminology to be used in the sequel. 
Suppose 7 is an ST for a formula A and B is a branch of T. Literals occur on B in 
an order set by the applications of (cut), suppose that it is (+p,,...,+pn), where 
+’ is a negation sign or no sign. In this situation we call sequence o = (p1,..., Dn) 
the order on B. It can happen that o contains all variables that occur in A, or 
that some of them are missing. Suppose that q1,...,@m are all of (and only) the 
distinct variables occurring in A. Each permutation of qi,...,@m will be called 
an instruction for a branch of an ST for A. Further, we will say that the order 
o on B complies with an instruction I iff either o = I, or o constitutes a proper 
initial segment of J. Finally, Z is an instruction for the construction of T, if T 
is a set of instructions for branches of an ST for A such that for each branch of 
T, the order on the branch complies with some element of Z. 

Let us observe that in the case of a regular ST the set containing one instruc- 
tion for a branch makes the whole instruction for the ST, as the instruction 
describes all the branches. Let us turn to examples. 75 from Example 3 has 
four branches with the following orders (from the left): (p,q), (p,q), (p, r}, (p, r). 
On the other hand, there are six permutations of p,q,r, and hence six possible 
instructions for branches of an arbitrary ST for the discussed formula. Order 
(p,q) complies with instruction (p,q,7r), and order (p,7r) complies with instruc- 
tion (p,r,q). The set {(p, q,r), (p, 7, q)} is an instruction for the construction of 
an ST for C, more specifically, it is an instruction for the construction of 75. 


3 ST, Analytic Tableaux, BDDs, and SAT Solvers 


The analogy between STs and analytic tableaux sketched in the last paragraph 
of the previous section breaks in two points. First, let us repeat: the ST method 
is both a satisfiability checker and a validity checker at once, just like a truth 
table is. Second, the analogy breaks on complexity issues. In the case of analytic 
tableaux the order of decomposing compound formulas is the key to a minimal 
tableau. In the case of STs, the key to an optimized use of the method is a clever 
choice of variables introduced on each branch. 

The main similarity between STs and Binary Decision Diagrams (BDDs, see 
e.g. [8,15]) is that both methods involve branching on variables. The main differ- 
ences concern the representation they work on and their aims: firstly, STs con- 
stitute a proof method, whereas BDDs are compact representations of Boolean 
formulas, used mainly for practical aims such as design of electronic circuits 
(VLSI design); secondly, ST applies to logical formulas, whereas construction 
of BDDs may start with different representations of Boolean functions, usually 
circuits or Boolean formulas. 

The structure of the constructed tree is also slightly different in the two 
approaches: in BDDs the inner nodes correspond to variables with outgoing 
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edges labelled with 1 or 0; in STs, on the other hand, inner nodes are labelled 
with literals or more complex formulas. The terminal nodes of a BDD (also called 
sinks, labelled with 1 or 0) indicate the value of a Boolean function calculated 
for the arguments introduced along the path from the root, whereas the leaves 
of an ST carry a synthesized formula (the initial one or its negation). In addition 
to that, the methods differ in terms of the construction process: in case of BDDs, 
tree structures are first generated and then reduced to a more compact form using 
the elimination and merging rules; the STs, in turn, are built ‘already reduced’. 
However, the interpretation of the outcome of both constructions is analogous. 
Firstly, for a formula A with n distinct variables p,,...,p, and the associated 
Boolean function f4 = fa(#1,...,2n), the following fact holds: If a branch of an 
ST containing literals from a set L ends with A or ~A synthesized (which means 
that assuming that the literals from L are true is sufficient to calculate the value 
of A), then the two mentioned reduction rules can be used in a BDD for fa, 
so that the route that contains the variables occurring in L followed by edges 
labelled according to the signs in L can be directed to a terminal node (sink). 
For example, if A can be synthesized on a branch with literals >p,, po and 7ps, 
then f4(0,1,0,24,...,%n) = 1 for all values of the variables y E€ {24,...,2n} 
and so the route in the associated BDD containing the variables 71,272 and x3 
followed by the edges labelled with 0, 1 and 0, respectively, leads directly to the 
sink labelled with 1. 

However, possibility of applying the reduction procedures for a BDD does 
not always correspond to the possibility of reducing an ST. For example, the 
reduced BDD for formula p V (q A ~q) consists of the single node labelled with 
p with two edges directed straight to the sinks 1 and 0; on the other hand, 
construction of an ST for the formula requires introducing q following the literal 
ap. This observation suggests that ST, in general, have greater size than the 
reduced BDDs. 

Strong similarity of the two methods is also illustrated by the fact that they 
both allow the construction of a disjunctive normal form (DNF) of the logical 
or Boolean formula to which they were applied. In the case of ST, DNF is the 
disjunction of conjunctions of literals that appear on branches finished with 
the formula synthesized. The smaller the ST, the smaller the DNF. Things are 
analogous with BDDs. 

Due to complexity issues, research on BDDs centers on ordered binary deci- 
sion diagrams (OBDDs), in which different variables appear in the same order 
on all paths from the root. A number of heuristics have been proposed in order 
to construct a variable ordering that will lead to the smallest OBDDs, using 
characteristics of the different types of representation of Boolean function (for 
example, for circuits, topological characteristics have been used for that pur- 
pose). OBDDs are clearly analogous to regular STs, the construction of which 
also requires finding a good variable ordering, leading to a smaller ST. We sup- 
pose that our methodology can also be used to find orderings for OBDDs by 
expressing Boolean functions as logical formulas. It is not clear to us whether 
the OBDDs methodology can be used in our framework. 
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Let us move on to other comparisons, this time with a lesser degree of detail. 
It is very instructive to compare the ST method to SAT-solvers, as their effec- 
tiveness is undeniably impressive nowadays”. The ST method does not aim at 
challenging this effectiveness. Let us explain, however, in what aspect the ST 
method can still be viewed as a computationally attractive alternative to a SAT 
solver. The latter produces an answer to question about satisfiability, sometimes 
producing also examples of satisfying valuations and/or counting the satisfy- 
ing valuations. In order to obtain an answer to another question—that about 
validity—one needs to ask about satisfiability of the initial problem negated. As 
we stressed above, the ST method answers the two questions at once, providing 
at the same time a description of classes of valuations satisfying and not satis- 
fying the initial formula. Hence one ST is worth two SAT-checks together with 
a rough model counting. 

Another interesting point concerns clausal forms. The method of ST does 
not require derivation of clausal form, but the applications of the rules of the 
system, defined via a-, G-notation, reflects the breaking of a formula into its 
components, and thus, in a way, leads to a definition of a normal form (a DNF, 
as we mentioned above). But this is not to say that an ST needs a full conversion 
to DNF. In this respect the ST method is rather similar to non-clausal theorem 
provers (e.g. non-clausal resolution, see [9,17]). 

Let us finish this section with a summary of the ST method. Formally, it 
is a proof method with many applications beyond the realm of CPL. In the 
area of CPL, semantically speaking, it is both satisfiability and validity checker, 
displaying semantic properties of a formula like a truth table does, but amenable 
to work more efficiently (in terms of the number of branches) than the latter 
method. The key to this efficiency is in the order of variables introduced in an 
ST. In what follows we present a method of construction of such variable orders 
and examine our approach in an experimental setting. 


4 Implementation 


The main functionality of the implementation described in this section is a con- 
struction of an ST for a formula according to an instruction provided by the 
user. If required, it can also produce all possible instructions for a given formula 
and build all STs according to them. In our research we have mainly used the 
second possibility. 

The implemented algorithm generates non-canonical, possibly irregular STs. 
Let us start with some basics. There are three main datatypes employed. Stan- 
dard, recursively defined formula type, For, used to represent propositional for- 
mulas; Monad Maybe Formula, MF, consisting of Just Formula and Nothing— 
used to express the fact that the synthesis of a given formula on a given branch 
was successful (Just) or not (Nothing). To represent an ST we use type of trees 
imported from Data.Tree. Thus every ST can be represented as Tree [MF] 


2 See (23, p. 2021]: contemporary SAT solvers can often handle practical instances with 
millions of variables and constraints. 
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[Tree [MF]], that is a tree labelled by lists of MF. We employed such a gen- 
eral structure having in mind possible extensions to non-classical logics (for CPL 
a binary tree is sufficient). The algorithm generating all possible ST for a given 
formula consists of the following steps: 


1. We start by performing a few operations on the goal-formula A: 
(a) a list of all components of A and all components of ~A, and a separate 
list of the variables occurring in A (atoms A) is generated; 
(b) the first list is sorted in such a way that all components of a given formula 
in that list precede it (sort A). 
2. After this initial step, based on the list atoms A, all possible instructions for 
the construction of an ST for A are generated (allRules (atoms A)). 
3. For each instruction from allRules (atoms A) we build an ST using the 
following strategy, called ‘compulsory’: 
(a) after each introduction of a literal (by (cut)) we try to synthesize (by the 
other rules) as many formulas from sort A as possible; 
(b) if no synthesizing rule is applicable we look into the instruction to intro- 
duce an appropriate literal and we go back to (a). Let us note that 7, 
Tə, Ts are constructed according to this strategy. 
4. Lastly, we generate a CSV file containing some basic information about each 
generated tree: int.al. the number of nodes and whether the tree is a proof. 


Please observe that the length of a single branch is linear in the size of a formula; 
this follows from the fact that sort A contains only the components of A. On 
the other hand, an ‘outburst’ of computational complexity enters on the level 
of the number of STs. In general, if k is the number of distinct variables in a 
formula A, then for k = 3 there are 12 different canonical STs, for k = 4 and 
k = 5 this number is, respectively, 576 and 1,688,800. In the case of k = 6 the 
number of canonical STs per formula exceeds 101? and this approach is no longer 
feasible’. 

The Haskell implementation together with necessary documentation is avail- 
able on https://ddsuam.wordpress.com/software-and-data/. 


5 dp-Measure and the Rest of Our Toolbox 


As we have already observed, in order to construct an optimal ST for a given 
formula one needs to make a clever choice of the literals to start with. The 
following function was defined to facilitate the smart choices. It assigns a rational 
value from the interval (0; 1) to each occurrence of a literal in a syntactic tree for 


3 Tt can be shown (e.g. by mathematical induction) that for formulas with k different 
variables, the total number of canonical STs is given by the following explicit formula: 
k 


J[ k-41. 


t= 
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formula A (in fact, it assigns the values to all elements of Comp(A)). Intuitively, 
the value reflects the derivative power of the literal in synthesizing A. 

The first case of the equation in Definition 3 is to make the function full 
(=total) on Form x Form, it also corresponds with the intended meaning of the 
defined measure: if B ¢ Comp(A), then B is of no use in deriving A. The second 
case expresses the starting point: to calculate the values of dp(A, B) for atomic 
B, one needs to assign 1 = dp(A, A); then the value is propagated down along 
the branches of a formula’s syntactic tree. Dividing the value a by 2 in the 
fourth line reflects the fact that both components of an a-formula are needed to 
synthesize the formula. In order to use the measure, we need to calculate it for 
both A and ~A; this follows from the fact that we do not know whether A or 
~A will be synthesized on a given branch. 


Definition 3. dp: Form x Form — (0; 1) 


0 if Bg Comp(A), 
1 i B=A, 
dp( A,B)=4a_ ifdp( A,77B) =a, 
$ if Be{C,7D} and dp( A, =(C —> D)) =a, 
a tif Be{AC,D} and dp( A,C > D)=a. 


Example 4. A visualization of calculating dp for formulas B,C from Examples 
2, 3 and for D = (p > =p) > p. 
(p> 79) > =(r > p)1 


p> (q> p)! oes P aG 
[N (p + =4)1 =(r > p)1 -(p + -p)1 pl 
apl q— pl aw gos P ee 
— P2 d 2 r3 P3 ps =p} 
—~(p > (q —> p))1 a a a ii 
e= a=») (@>-) >r) p> pi 
ps a(q > p)5 ws 1 A 1 
oe p—7q5 —(r > p)s5 p> p3 “P3 
lapl A r— ps N 
q4 “PI 2 1 1 
r3 P3 


As one can see from Example 4, the effect of applying the dp measure to 
a formula and its negation is a number of values that need to be aggregated 
in order to obtain a clear instruction for an ST construction. However, some 
conclusions can be drawn already from the above example. It seems clear that 
the value dp( p > (q —> p), p ) = 1 corresponds to the fact that p is sufficient to 
synthesize the whole formula (as witnessed by 74, see Example 1). So is the case 
with ~p. On the other hand, even if ~q is sufficient to synthesize the formula, q is 
not (see T2, Example 1), hence the choice between p and q is plain. But it seems 
to be the only obvious choice at the moment. In the case of the second formula, 
every literal gets the same value: 0.5. What is more, in the case of longer formulas 
a situation depicted by the rightmost syntactic trees is very likely to happen: 
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we obtain dp(D, p) = 0.5 twice (since dp works on occurrences of literals), and 
dp(-=D, ap) = 0.5 three times. 

In the aggregation of the dp-values we use the parametrised Hamacher s- 
norm, defined for a,b € (0; 1) as follows: 


a+b—ab—(1—A)ab 
1—(1—A)ab 


as,b= 


for which we have taken A = 0.1, as the value turned out to give the best results. 
Hamacher s-norm can be seen as a fuzzy alternative; it is commutative and 
associative, hence it is straightforward to extend its application to an arbitrary 
finite number of arguments. For a = b = c= 0.5 we obtain: 


asb 7a 0.677, and (as) b)s, c 0.768 


The value of this norm is calculated for a formula A and a literal | by taking 
the dp-values dp(A,1) for each occurrence l in the syntactic tree of A. This 
value will be denoted as ‘h(A,1)’; in case there is only one value dp(A,l), we 
take h(A,l) = dp(A,1). Hence, referring to the above Example 4, we have e.g. 
h(B,p) = 1, h(-B, =p) = 0.25, h(4D, =p) ~ 0.768. 

Finally, function H is defined for variables, not their occurrences, in formula 
A as follows: 


max(h(A, pi), h(A, p;)) + max(h(A, ap;), h(-A, 7p; )) 
2 


The important property of this apparatus is that for a,b < 1 we have aso, b > 
max{a,b}, and thus h(A,l) and H(A, p;) are sensitive to the number of aggre- 
gated elements. Another desirable feature of the introduced functions is that 
h(A,p;) = 1 indicates that one can synthesize A on a branch starting with p; 
without further applications of (cut); furthermore, H(A, p;i) = 1 indicates that 
both p; and ~p; have this property. 

Let us stress that the values of dp, h and H are very easy to calculate. 
Given a formula A, we need to assign a dp-value to each of its components, 
and the number of components is linear in the length of A. On the other hand, 
the information gained by these calculations is sometimes not sufficient. The 
assignment dp(A, pi) = 2~™ says only that A can be built from p; and m other 
components of A, but it gives us no clue as to which components are needed. 
In Example 4, H works perfectly, as we have H(B,p) = 1 and H(B,q) = 0.625, 
hence H indicates the following instruction of construction of an ST: {(p,q)}. 
Unfortunately, in the case of formula C we have H(C, p) = H(C,q) = H(C,r) = 
0.5, hence a more sophisticated solution is needed. 


6 Data 


At the very beginning of the process of data generation we faced the following 
general problem: how to make any conclusive inferences about an infinite pop- 
ulation (all Form) on the basis of finite data? Considering the methodological 
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problems connected with applying classical statistical inference methods in this 
context, we limited our analysis to descriptive statistics, exploratory analysis 
and testing. To make this as informative as possible, we took a ‘big data’ app- 
roach: for every formula we generated all possible STs, differing in the order of 
applications of (cut) on particular branches. In addition to that, where it was 
feasible, we generated all possible formulas falling under some syntactical spec- 
ifications. The approach is aimed at testing different optimisation methods as 
well as exploring data in search for patterns and new hypotheses. The knowl- 
edge gained in this way is further used on samples of longer formulas to examine 
tendencies. 

From now on we use / for the length of a formula, k for the number of distinct 
variables occurring in a formula, and n for the number of all occurrences of 
variables (leaves, if we think of formulas as trees). On the first stage we examined 
a dataset containing all possible STs for formulas with l = 12 and k < 4. There 
are over 33 million of different STs already for these modest values; for larger 
k the data to analyse was simply too big. We generated 242,265 formulas, from 
which we have later removed those with k < 2 and/or k = n, as the results for 
them where not interesting. In the case of further datasets we also generated all 
possible STs, but the formulas were longer and they were randomly generated*. 
And so we considered (i) 400 formulas with l = 23, k = 3, (ii) 400 formulas with 
l = 23,k = 4, (iii) 100 formulas with | = 23,k = 5. In all cases 9 < n < 12; 
this value is to be combined with the occurrences of negations in a formula—the 
smaller n, the more occurrences of negation. 

Having all possible STs for a formula generated, we could simply check what 
is the optimal ST” size for this formula. The idea was to look for possible rela- 
tions between, on the one hand, instructions producing the small STs, and, on 
the other hand, properties of formulas that are easy to calculate, like dp or 
numbers of occurrences of variables. The first dataset included only relatively 
small formulas; however, with all possible formulas of a given type available, it 
was possible e.g. to track various types of ‘unusual’ behaviour of formulas and 
all possible problematic issues regarding the optimisation methods, which could 
remain unnoticed if only random samples of formulas were generated. In case 
of randomly generated formulas the ‘special’ or ‘difficult’ types of formulas may 
not be tracked (as the probability of drawing them may be small), but instead 
we have an idea of an ‘average’ formula, or average behaviour of the optimisation 
methods. By generating all the STs, in turn, we gained access to full information 
not only about the regular but also irregular STs, which is the basis for indicating 
the set of optimal STs and the evaluation of the optimisation methods. 


7 Data Analysis and a Discussion of Results 


In this section we present some results of analyses performed on our data. The 
main purpose of the analyses is to test the effectiveness of the function H in terms 


4 The algorithm of generating random formulas is described in [11]. The author pre- 
pared also the Haskell implementation of the algorithm. See https://github.com/ 
kiryk/random-for. 
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Fig. 1. Distribution of the difference between the size of a maximal and that of a min- 
imal ST for formulas with k = 4,5. 


of indicating a small ST. Moreover, we performed different types of exploratory 
analysis on the data, aiming at understanding the variation of size among all 
STs for different formulas, and how it relates to the effectiveness of H. 

Most results will be presented for the five combinations of the values of l and 
k in our data, that is, l = 12,k € {3,4} and l = 23, k € {3,4,5}; however, some 
results will be presented with the values of k = 3 and k = 4 grouped together 
(where the difference between them is insignificant) and the charts are presented 
only for k > 4. 

We will examine the variation of size among STs using a range statistic: 
by range of the size of ST for a formula A (ST range, for short) we mean the 
difference between an ST of maximal and minimal size; this value indicates the 
possible room for optimization. The maximal-size ST is bounded by the size of 
a canonical ST for a given formula; its size depends only on k. For k = 4a 
canonical ST has 16 branches, for k = 5 it is 32 branches. 

The histograms on Fig. 1 present the distributions of ST range for formulas 
with k = 4 and k = 5. The rightmost bar in the histogram for l = 23, k = 5 says 
that for 5 (among 100) formulas there are STs with only two branches, where 
the maximal STs for these formulas have 32 branches. We can also read from the 
histograms that for formulas with k = 4 the ST range of some formulas is equal 
to 0 (7.9% of formulas with 1 = 12 and 3.5% with l = 23), which means that 
all STs have the same size. We have decided to exclude these formulas from the 
results of tests of efficiency of H, as the formulas leave no room for optimization. 
However, as can be seen on the histogram, there were no formulas of this kind 
among those with k = 5. This indicates that with the increase of k the internal 
differentiation of the set of STs for a formula increases as well, leading to a 
smaller share of formulas with small ST range. 

Two more measures relating to the distribution of the size of ST may be of 
interest. Firstly, the share of formulas for which no regular ST is of optimal size— 
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Table 2. Row A: the share of formulas that do not have a regular ST of optimal size. 
Row B: the share of optimal STs among all STs for a formula; this was first calculated 
for each formula, then averaged over all formulas in a given set. 


it indicates how wrong we can be in pointing to only the regular STs. Secondly, 
the percentage share of optimal STs among all STs for a given formula. The 
latter gives an idea what is the chance of picking an optimal ST at random. 
Table2 presents both values for formulas depending on k and l (let us recall 
that formulas with ST range equal to 0 are excluded from the analysis). In both 
cases we can see clearly a tendency with growing k. As was to be expected, the 
table shows that the average share of optimal STs depends on the value of k 
rather than the size of the formula. This is understandable—as the number of 
branches depends on & only, the length of a formula translates to the length of 
branches, and the latter is linear in the former. In a way, this explains why the 
results are almost identical when the size of STs is calculated in terms of nodes 
rather than branches (as we mentioned above, the overall correlation between 
the two measures makes the choice between them irrelevant). 

We can categorise the output of the function H into three main classes. In the 
first case, the values assigned to variables by H strictly order the variables, which 
results in one specific instruction of construction of a regular ST. The general 
score of such unique indications was very high: 70.9% for formulas with l = 12, 
92.0% for l = 23,k = 3,4, and 72.0% for k = 5. The second possibility is when 
H assigns the same value to each variable; in this case we gain no information 
at all (let us recall that we have excluded the only cases that could justify such 
assignments, that is, the formulas for which each ST is of the same size). The 
share of such formulas in our datasets was small: 0.6% for l = 12, 0.1% for 
l = 23,k = 3,4 and 0% for k = 5, suggesting that it tends to fall with k rising. 
The third possibility is that the ordering is not strict, yet some information is 
gained. In this case for some, but not all, variables the value of H is the same. 

The methodology used to asses effectiveness of H is quite simple. We assume 
that every indication must be a single regular instruction, hence we use additional 
criteria in case of formulas of the second and third kind described, in order to 
obtain a strict ordering. If H outputs the same value for some variables, we first 
order the variables by the number of occurrences in the formula; if the ordering 
is still not strict, we give priority to variables for which the sum of depths for all 
occurrences of literals in the syntactic tree is smaller; finally, where the above 
criteria do not provide a strict ordering, the order is chosen at random. 

We used three evaluating functions to asses the quality of indications. Each 
function takes as arguments a formula and the ST for this formula indicated by 
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Table 3. The third column gives the number of formulas satisfying the characteristic 
presented in the first and the second column. The further three columns display values 
averaged on the sets. Fı indicates how often we indicate an optimal ST. Fə reports 
the mistake of our indication calculated as the difference of sizes between the indi- 
cated ST and an optimal one. Finally, POT indicates proximity to an optimal ST in a 
standardized way. 


l | no of formulas, Fi Fy | POT 
3 12 113,190 0.935 | 0.089 | 0.974 
23 400 0.923 0.104 | 0.966 
4/12 53,130 | 0.859 | 0.286 | 0.966 
23 400 0.836 0.297 | 0.960 
5 | 23 100 | 0.75 | 0.52 0.971 


our heuristics. The first function (F; in Table 3) outputs 1 if the indicated ST 
is of optimal size, 0 otherwise. The second function (F> in Table3) outputs the 
difference between the size of the indicated ST and the optimal size. The third 
function is called proximity to optimal tableau, POT 4 in symbols: 


pot4(T) =1—- e /= mina 


MALA — ming 


where T is the ST for formula A indicated by H, |T] is the size of T, max, is 
the size of an ST for A of maximal size, and min, is the size of an optimal ST 
for A. Later on we skip the relativization to A. Let us observe that the value 
Himin represents a mistake in indication relative to the ST range of a formula, 
and in this sense POT, can be considered as a standardized measure of the 
quality of indication. Finally, values of each of the three evaluating functions 
were calculated for sets of formulas, by taking average values over all formulas 
in the set. 

The results of the three functions presented in Table 3 show that optimal STs 
are indicated less often for formulas with greater k; however, the POT values 
seem to remain stable across all data, indicating that, on average, proximity of 
the indicated ST to the optimal ones does not depend on k or l. 

Further analysis showed that the factor that most influenced the efficiency of 
our methodology was whether there is at least one value 1 among the dp-values 
of literals for a formula A. We shall write ‘Max(dp) = 1’ if this is the case, 
and ‘Max(dp) < 1’ otherwise (we skip the relativisation to A for simplicity). 
For formulas with Max(dp) = 1, results of the evaluating functions were much 
better; for example, the value of the POT function for formulas with l = 12 was 
0.979 if Max(dp) = 1, and 0.814 for those with Max(dp) < 1; in case of formulas 
with | = 23,k = 3,4 those values were 0.968 and 0.869, respectively, and for 
formulas with l = 23,k = 5 it was 0.974 and 0.901, respectively. This shows 
that our methodology works significantly worse if Max(dp) < 1; on the other 
hand, if Max(dp) = 1, the dp measure works very well. It should also be pointed 
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Fig. 2. Distribution of the difference between indicated and optimal ST in relation 
to ST-range. Every point corresponds to a formula, the points are slightly jittered in 
order to improve readibility. Each chart corresponds to different data, formulas k = 3 
are excluded; additionally the colour indicates whether Max(dp) = 1 for a formula. 


out that the difference between the POT values for both groups is smaller for 
formulas with greater l and k. Figure2 presents a scatter plot that gives an 
idea of the whole distribution of the values of the POT function in relation to 
the ST range. Each formula on the plot is represented by a point, the colours 
additionally indicating whether Max(dp) < 1. The chart suggests, similarly as 
Table 3, that the method works well as the values of l and k rise for formulas, 
indicating STs that are on average equally close to the optimal ones. 

One can point at two possible explanations of the fact that our methodology 
works worse for formulas with Max(dp) < 1. Firstly, if e.g., dp(A,p) = 27™, 
we only obtain the information that, except for p, m more occurrences of com- 
ponents of A are required in order to synthesize the whole formula. Secondly, 
the function H neglects the complex dependencies between the various aggre- 
gated occurrences of a given variable, taking into account only the number of 
occurrences of literals in an aggregated group. However, considering very low 
computational complexity of the method based on the dp values and the func- 
tion H, the outlined framework seems to provide good heuristics for indicating 
small STs. Methods that would reflect more aspects of the complex structure of 
logical formulas would likely require much more computational resources. 

On a final note, we would like to add that exploration of the data allowed 
us to study properties of formulas that went beyond the scope of the optimi- 
sation of ST. The data was used in a similar way as in so called Experimental 
Mathematics, where numerous instances are analysed and visualized in order to 
e.g. gain insight, search for new patterns and relationships, test conjectures and 
introduce new concepts (see e.g. [1]). 
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Table 4. The pigeonhole principle 


PHP» the size of ST 
m indicated by H | minimal | canonical ST 
1 7 3 3 4 
2 34 15 11 64 
3 |12| 90 99 43 4096 
4 | 20 | 184 783 189 ae 


8 The Pigeonhole Principle 


At the end we consider the propositional version of the principle introduced by 
Cook and Reckhow in [3, p. 43]. In the field of proof complexity the principle 
was used to prove that resolution is intractable, that is, any resolution proof of 
the propositional pigeonhole principle must be of exponential size (wrt the size 
of the formula). This has been proved by Haken in [10], see also[2]. 

Here is PHP,, in the propositional version: 


A V Pij > VV V (Dig ^ Pn,j) 


O<i<m 0<j<m O0<i<n<m 0<j<m 


where /\ and V stand for generalized conjunction, disjunction (respectively) with 
the range indicated beneath. 

The pigeonhole principle is constructed in a perfect symmetry of the roles 
played by the consecutive variables. Each variable has the same number of occur- 
rences in the formula, and each of them gets the same value under H, they also 
have occurrences at the same depth of a syntactic tree. All this means that in our 
account we can only suggest a random, regular ST. However, it is worth noticing 
that, first, H behaves consistently with the structure of the formula, and second, 
the result is still attractive. In Table 4 the fourth column presents the size of the 
ST indicated by our heuristics, that is, in fact, generated by random ordering of 
variables. It is to be contrasted with the number 2” in the last column describing 
the size of a canonical ST for the formula, which is at the same time the number 
of rows in a truth table for the formula. The minimal STs for the formulas were 
found with pen and paper and they are irregular. 


9 Summary and Further Work 


We presented a proof method of Synthetic Tableaux for CPL and explained 
how the efficiency of tableau construction depends on the choices of variables 
to apply (cut) to. We defined possible algorithms to choose the variables and 
experimentally tested their efficiency. 

Our plan for the next research is well defined and it is to implement heuristics 
amenable to produce instructions for irregular STs. We have an algorithm, yet 
untested. 
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As far as proof-theoretical aims are concerned, the next task is to extend and 
adjust the framework to the first-order level based on the already described ST 
system for first-order logic [14]. We also wish to examine the efficiency of our 
indications on propositional non-classical logics for which the ST method exists 
(see [20,22]). In the area of data analysis another possible step would be to 
perform more complex statistical analysis using e.g. machine learning methods. 
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Abstract. We introduce a paraconsistent modal logic KG’, based on 
Gödel logic with coimplication (bi-Gédel logic) expanded with a De Mor- 
gan negation ~. We use the logic to formalise reasoning with graded, 
incomplete and inconsistent information. Semantics of KG? is two- 
dimensional: we interpret KG? on crisp frames with two valuations v1 
and v2, connected via ~, that assign to each formula two values from 
the real-valued interval [0,1]. The first (resp., second) valuation encodes 
the positive (resp., negative) information the state gives to a statement. 
We obtain that KG? is strictly more expressive than the classical modal 
logic K by proving that finitely branching frames are definable and by 
establishing a faithful embedding of K into KG’. We also construct a con- 
straint tableau calculus for KG? over finitely branching frames, establish 
its decidability and provide a complexity evaluation. 


Keywords: Constraint tableaux - Gödel logic - Two-dimensional 
logics - Modal logics 


1 Introduction 


People believe in many things. Sometimes, they even have contradictory beliefs. 
Sometimes, they believe in one statement more than in the other. However, if 
a person has contradictory beliefs, they are not bound to believe in anything. 
Likewise, believing in ¢ strictly more than in x makes one believe in ¢ completely. 
These properties of beliefs are natural, and yet hardly expressible in the classical 
modal logic. In this paper, we present a two-dimensional modal logic based on 
Godel logic that can formalise beliefs taking these traits into account. 


Two-Dimensional Treatment of Uncertainty. Belnap-Dunn four-valued 
logic (BD, or First Degree Entailment—FDE) [4, 16,34] can be used to formalise 


The research of Marta Bílková was supported by the grant 22-01137S of the Czech 
Science Foundation. The research of Sabine Frittella and Daniil Kozhemiachenko was 
funded by the grant ANR JCJC 2019, project PRELAP (ANR-19-CE48-0006). This 
research is part of the MOSAIC project financed by the European Union’s Marie 
Skłodowska-Curie grant No. 101007627. 

© The Author(s) 2022 


J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 429-448, 2022. 
https://doi.org/10.1007/978-3-031-10769-6_26 


430 M. Bílková et al. 


reasoning with both incomplete and inconsistent information. In BD, formulas 
are evaluated on the De Morgan algebra 4 (Fig. 1, left) where the four values 
{t, f,b,n} encode the information available about the formula: true, false, both 
true and false, neither true nor false. b and n thus represent inconsistent and 
incomplete information, respectively. It is important to note that the values 
represent the available information about the statement, not its intrinsic truth or 
falsity. Furthermore, this approach essentially treats evidence for a statement (its 
positive support) as being independent of evidence against it (negative support) 
which allows to differentiate between ‘absence of evidence’ and the ‘evidence of 
absence’. The BD negation — then swaps positive and negative supports. 


J\ AN. 
NO <7 


Fig. 1. 4 (left) and its continuous extension [0,1]”" (right). (x,y) <o, (a',y’) iff 
x< gx and y> y’. 


The information regarding a statement, however, might itself be not crisp— 
after all, our sources are not always completely reliable. Thus, to capture the 
uncertainty, we extend 4 to the lattice [0,1] (Fig. 1, right). [0,1] is a twist 
product (cf, [37] for definitions) of [0,1] with itself: the order on the second 
coordinate is reversed w.r.t. the order on the first coordinate. This captures the 
intuition behind the usual ‘truth’ (upwards) order: an agent is more certain in 
x than in ¢ when the evidence for x is stronger than the evidence for @ while 
the evidence against x is weaker than the evidence against ¢. 

Note that [0,1]” is a bilattice whose left-to-right order can be interpreted as 
the information order. This links the logics we consider to bilattice logics applied 
to reasoning in AI in [19] and then studied further in [24,35]. 


Comparing Beliefs. Uncertainty is manifested not only in the non-crisp char- 
acter of the information. An agent might often lack the capacity to establish the 
concrete numerical value that represents their certainty in a given statement. 
Indeed, ‘I am 43% certain that the wallet is Paula’s’ does not sound natural. On 
the other hand, it is reasonable to assume that the agents’ beliefs can be com- 
pared in most contexts: neither ‘I am more confident that the wallet is Paula’s 
than that the wallet is Quentin’s’, nor ‘Alice is more certain than Britney that 
Claire loves pistachio ice cream’ require us to give a concrete numerical repre- 
sentation to the (un)certainty. 

These considerations lead us to choosing the two-dimensional relative of the 
Gödel logic dubbed G? as the propositional fragment of our logic. G? was intro- 
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duced in [5] and is, in fact, an extension of Moisil’s logic! from [31] with the 
prelinearity axiom (p — q) V (q —> p). As in the original Gödel logic G, the 
validity of a formula in G? depends not on the values of its constituent variables 
but on the relative order between them. In this sense, G is a logic of comparative 
truth. Thus, as we treat positive and negative supports of a given statement 
independently, G? is a logic of comparative truth and falsity. Note that while the 
values of two statements may not be comparable (say, p is evaluated as (0.5, 0.3) 
and q as (0,0)), the coordinates of the values always are. We will see in Sect. 2, 
how we can formalise statements comparing agents’ beliefs. 

The sources available to the agents as well as the references between these 
sources can be represented as states in a Kripke model and its accessibility rela- 
tion, respectively. It is important to mention that we account for the possibility 
that a source can give us contradictory information regarding some statement. 
Still, we want our reasoning with such information to be non-trivial. This is 
reflected by the fact that (pA —p) — q is not valid in G?. Thus, the logic (treated 
as a set of valid formulas) lacks the explosion principle. In this sense, we call 
G? and its modal expansions ‘paraconsistent’. This links our approach to other 
paraconsistent fuzzy logics such as the ones discussed in [17]. 

To reason with the information provided by the sources, we introduce two 
interdefinable modalities—O and Q—interpreted as infima and suprema w.r.t. 
the upwards order on [0,1]. We mostly assume (unless stated otherwise) that 
accessibility relations in models are crisp. Intuitively, it means that the sources 
are either accessible or not (and, likewise, either refer to the other ones, or not). 


Broader Context. This paper is a part of the project introduced in [6] and 
carried on in [5] aiming to develop a modular logical framework for reasoning 
based on uncertain, incomplete and inconsistent information. We model agents 
who build their epistemic attitudes (like beliefs) based on information aggregated 
from multiple sources. O and ® can be then viewed as two simple aggregation 
strategies: a pessimistic one (the infimum of positive support and the supremum 
of the negative support), and an optimistic one (the dual strategy), respectively. 
They can be defined via one another using ~ in the expected manner: O¢ stands 
for =O-7¢ and ¢ for =0-¢@. In this paper, in contrast to [15] and [6], we do 
allow for modalities to nest. 

The other part of our motivation comes from the work on modal Gödel 
logic (6R—in the notation of [36]) equipped with relational semantics [12, 13, 
36]. There, the authors develop proof and model theory of modal expansions 
of G interpreted over frames with both crisp and fuzzy accessibility relations. 
In particular, it was shown that the O-fragment? of 6 lacks the finite model 
property (FMP) w.r.t. fuzzy frames while the -fragment has FMP? only w.r.t. 
fuzzy (but not crisp) frames. Furthermore, both O and ¢ fragments of 6K are 
PSPACE-complete [28, 29]. 


1 This logic was introduced several times: by Wansing [38] as l4C4 and then by Leit- 
geb [27] as HYPE. Cf. [33] for a recent and more detailed discussion. 

? Note that O and Q are not interdefinable in 6A—cf. [36, Lemma 6.1] for details. 

3 There is, however, a semantics in [11] w.r.t. which bi-modal 6 has FMP. 
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Description Gödel logics, a notational version of modal logics, have found 
their use the field of knowledge representation [8-10], in particular, in the repre- 
sentation of vague or uncertain data which is not possible in the classical ontolo- 
gies. In this respect, our paper provides a further extension of representable data 
types as we model not only vague reasoning but also non-trivial reasoning with 
inconsistent information. 

In the present paper, we are expanding the language with the Gödel coimpli- 
cation x to allow for the formalisation of statements expressing that an agent is 
strictly more confident in one statement than in another one (cf. Sect. 2 for the 
details). Furthermore, the presence of ~ will allow us to simplify the frame defin- 
ability. Still, we will show that our logic is a conservative extension of 6A°—the 
modal Gödel logic of crisp frames from [36] in the language with both O and Q. 


Logics. We are discussing many logics obtained from the propositional Gödel 
logic G. Our main interest is in the logic we denote KG?. It can be produced from 
G in several ways: (1) adding De Morgan negation ~ to obtain G? (in which case 
@~<¢@’' can be defined as =(=¢! — ~g) ) and then further expanding the language 
with O or Q; (2) adding < or A (Baaz’ delta) to G, then both O and > thus 
acquiring KbiG* (modal bi-Gédel logic) which is further enriched with =. These 
and other relations are given on Fig. 2. 


Fig. 2. Logics in the article. ff stands for ‘permitting fuzzy frames’. Subscripts on 
arrows denote language expansions. / stands for ‘or’ and comma for ‘and’. 


Plan of the Paper. The remainder of the paper is structured as follows. In 
Sect.2, we define bi-Gédel algebras and use them to present KbiG (on both 
fuzzy and crisp frames) and then KG? (on crisp frames), show how to formalise 
statements where beliefs of agents are compared, and prove some semantical 
properties. In Sect. 3, we show that Q fragment of KbiG‘ (KbiG on fuzzy frames) 
lacks finite model property. We then present a finitely branching fragment of 
KG? (KG?) and argue for its use in representation of agents’ beliefs. In Sect. 4, 
we design a constraint tableaux calculus for KG}, which we use to obtain the 
complexity results. Finally, in Sect. 5 we discuss further lines of research. 


* To the best of our knowledge, the only work on bi-Gédel (symmetric Gödel) modal 
logic is [20]. There, the authors propose an expansion of biG with O and ¢ equipped 
with proof-theoretic interpretation and provide its algebraic semantics. 
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2 Language and Semantics 


In this section, we present semantics for KbiG (modal bi-Gédel logic) over both 
fuzzy and crisp frames and the one for KG? over crisp frames. Let Var be a count- 
able set of propositional variables. The language bilg, is defined via the fol- 
lowing grammar. 


$ := p E Var | =| (A 4) | ($ V 4) | ($ > 4) | (6< 4) 1046] 0¢ 


Two constants, 0 and 1, can be introduced in the traditional fashion: 0 := p < p, 
1 := p — p. Likewise, the Gödel negation can be also defined as expected: 
nE = ġ —> 0. The —less fragment of bilg is denoted with bilg,o. 

To facilitate the presentation, we introduce bi-Gédel algebras. 


Definition 1. The bi-Gödel algebra [0,1] = ([0,1],0,1, Ac, VG; —c, xc) is 
defined as follows: for all a,b € [0,1], the standard operations are given by 
a ^g b := min(a, b), a Vg b := max(a, by 


1, ifa<b 0, ifb<a 
a —>cb= bxga= 
b else. 


Definition 2. 


— A fuzzy frame is a tuple ¥ = (W, R) with W #4 Ø and R:W x W — [0,1]. 
— A crisp frame is a tuple §¥ = (W, R) with W 4 Ø and RCW x W. 


Definition 3 (KbiG models). A KbiG model is a tuple M = (W, R,v) with 
(W, R) being a (crisp or fuzzy) frame, and v : Var x W — [0,1]. v (a valuation) 
is extended on complex bilo, formulas as follows: 


u(god',w) = o(9, w) og (9, w). (o€ {A V; >, <}) 
The interpretation of modal formulas on fuzzy frames is as follows: 


v(O¢,w) = inf {wRu' >c v(ġ,w)}, v(Od,w) = sup {wRw" Ag v(ġ, w')}. 
w'ew w EW 


On crisp frames, the interpretation is simpler (here, inf(@)=1 and sup(@) =0): 
v(O¢, w) = inf{v(ġ, w) : wRw}, —-o(0¢, w) = sup{u(¢, w’) : wRw'}. 


We say that @ € bilo, is KbiG valid on frame § (denote, § Expic ) iff for 
any w € %, it holds that v(ġ, w) = 1 for any model M on F. 


Note that the definitions of validity in 6R° and GK coincide with those in KbiG 
and KbiG! if we consider the ~-free fragment of biL 

As we have already mentioned, on crisp frames, the ER relation can 
be understood as availability of Gase or ‘eliable) sources. In fuzzy frames, it 
can be thought of as the degree of trust one has in a source. Then, )¢ represents 
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the search for evidence from trusted sources that supports ¢: v(O¢,t) > 0 iff 
there is t s.t. tRt’! > 0 and v(¢,t’) > 0, i.e., there must be a source t to 
which ¢ has positive degree of trust and that has at least some certainty in ¢. 
On the other hand, if no source is trusted by t (i.e., tRu = 0 for all u), then 
v(O¢, t) = 0. Likewise, Ox can be construed as the search of evidence against x 
given by trusted sources: v(Ox,t) < 1 iff there is a source t that gives to x less 
certainty than t gives trust to t’. In other words, if t trusts no sources, or if all 
sources have at least as high confidence in x as t has in them, then t fails to find 
a trustworthy enough counterexample. 


Definition 4 (KG? models). A KG? model is a tuple IN = (W, R, v1, v2) with 
(W, R) being a crisp frame, and v1, v2: Var x W — [0,1]. The valuations which 
we interpret as support of truth and support of falsity, respectively, are extended 
on complex formulas as expected. 


v1(a¢g, w) = v2(¢, w) v2(7¢, w) = vı ($, w) 
a ^ g, w) = vi ($, w) Ag 11(9', w) p A, w) = v2(¢, w) Ve v2(¢', w) 
vil V o, w) = vip, w) Ve ri (',w) val V g, w) = vzh, w) Ag v2(¢', w) 
vil > p, w) = 11(¢,w) =>cv(p, w) vzal > p, w) = v9, w) <s v2(¢, w) 
Eo w) E w) <c 1(¢', w) volo < ¢', w) = v2(¢', w)—>cv2(ġ, w) 
vı(O¢, w) = inf{v1(¢, w) : wRw'} v2(Od, w) = sup{v2(¢, w’) : wRw'} 
HORA ) = sup{v1(¢, w’) : wRw’} v2(O¢d, w) = inf{ve(¢, w’) : wRw’} 


We say that ọ € bila 4 is KG? valid on frame § (§ Exce ¢) iff for any 
w € §, it holds that vı(ġ, w) = 1 and v2(¢,w) = 0 for any model M on F. 


Convention 1. In what follows, we will denote a pair of valuations (v1, v2) just 
with v if there is no risk of confusion. Furthermore, for each frame § and each 
w E€ §, we denote 


Rew) = {w : wRw' = 1}, (for fuzzy frames) 
R(w) = {w : wRw'}. (for crisp frames) 


Convention 2. We will further denote with KbiG the set of all formulas KbiG- 
valid on all crisp frames; KbiG' the set of all formulas KbiG-valid on all fuzzy 
frames; and KG? —the set of all formulas KG? valid on all crisp frames. 


Before proceeding to establish some semantical properties, let us make two 
remarks. First, neither O nor ¢ are trivialised by contradictions: in contrast to 
K, O(pA-p) — Oq is not KG? valid, and neither is (pA ap) — Oq. Intuitively, 
this means that one can have contradictory but non-trivial beliefs. Second, we 
can formalise statements of comparative belief such as the ones we have already 
given before: 


wallet: I am more confident that the wallet is Paula’s than that the wallet 
is Quentin’s. 

ice cream: Alice is more certain than Britney that Claire loves pistachio 
ice cream. 
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For this, consider the following defined operators. 


Ar := ~(1 <7) (1) 


A’b:= ~(1< p) An~ ¢) (2) 


It is clear that for any T € bilo,ọ and ¢ € bilg 4 interpreted on KbiG and KG? 
models, respectively, it holds that 


1 ifv(7,w)=1 
0 otherwise, 


(1,0) if v(¢,w) = (1,0) 


v(Art, w) = v(A hw) = K 1) otherwise. 


(3) 


Now we can define formulas that express order relations between values of two 
formulas both for KbiG and KG?. 
For KbiG they look as follows: 


u(t, w) < u(r, w) iff v((A(r > 7’), w) = 1, 
u(T,w) > v(7', w) iff v(~A(7’ 3 7), w) = 1. 


In KG?, the orders are defined in a more complicated way: 


v(d,w) < v(9', w) if (A ($ > ¢'), w) = (1,0), 
v(d,w) > v(g', w) iff v(a (db! > ¢) ^A ~A (9 > g), w) = (1,0). 


Observe, first, that both in KbiG and KG? the relation ‘the value of 7 (¢) is less 
or equal to the value of 7’ (¢’)’ is defined as ‘r — 7’ (¢ > ¢’) has the designated 
value’. In KbiG, the strict order is just a negation of the non-strict order since all 
values are comparable. On the other hand, in contrast to KbiG, the strict order 
in KG? is not a simple negation of the non-strict order since KG? is essentially 
two-dimensional. We provide further details in Remark 2. 

Finally, we can formalise wallet as follows. We interpret ‘I am confident’ as 
and substitute ‘the wallet is Paula’s’ with p, and ‘the wallet is Quentin’s’ with q. 
Now, we just use the definition of > in bilg, to get 


A” (Op > Og) A ~A (Og > Op). (4) 


For ice cream, we need two different modalities: Oa and Oy for Alice and Brittney, 
respectively. Replacing ‘Alice loves pistachio ice cream’ with p, we get 


A’ (ap > Osp) A ~A (ep > Cap). (5) 


Remark 1. A is called Baaz’ delta (cf., e.g. [3] for more details). Intuitively, Ar 
can be interpreted as ‘r has the designated value’ and acts much like a necessity 
modality: if r is KbiG valid, then so is At; moreover, A(p > q) > (Ap > Ag) 
is valid. Furthermore, A and x can be defined via one another in KbiG, thus the 
addition of A to G makes it more expressive and allows to define both strict and 
non-strict orders. 
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Remark 2. Recall that we mentioned in Sect. 1 that an agent should usually be 
able to compare their beliefs in different statements: this is reflected by the fact 
that A(p — q) V A(q — p) is KbiG valid. It can be counter-intuitive if the 
contents of beliefs have nothing in common, however. 

This drawback is avoided if we treat support of truth and support of falsity 
independently. Here is where a difference between KbiG and KG? lies. In KG?, 
we can only compare the values of formulas coordinate-wise, whence A7 (p —> 
q)V A7 (q > p) is not KG? valid. E.g., if we set v(p, w) = (0.7, 0.6) and v(q, w) = 
(0.4, 0.2), v(p, w) and v(q, w) will not be comparable w.r.t. the truth (upward) 
order on [0, 1]. 


We end this section with establishing some useful semantical properties. 
Proposition 1. ¥ Exe: ¢ iff for any model M on F and any w EF, vi(¢, w) =1. 


Proof. The ‘if’ direction is evident from the definition of validity. We show the 
‘only if’ part. It suffices to show that the following statement holds for any ¢ 
and w € §: 


for any vu(p, w) = (x,y), let v*(p, w) = (1—y,1— x). Then v(¢, w) = (x,y) 

iff v*(¢,w) = (1—y,1—2). 

We proceed by induction on ¢. The proof of propositional cases is identical to 
the one in [5, Proposition 5]. We consider only the case of ¢ = Ow since O and 
© are interdefinable. 

Let v(Ow, w) = (x,y). Then inf{v1(, w’) : wRw’} = z, and sup{ve(w, w’) : 
wRw'} = y. Now, we apply the induction hypothesis to ~, and thus if u(w, s) = 
(x’,y’), then v* (a, s) = (1—y’, 1-2’) for any s € R(w). But then inf{vi (p, w’) : 
wRw'} = 1 — y, and sup{v3(v~, w’) : wRw'} = 1 — z as required. 

Now, assume that v1(¢, w) = 1 for any vı and w. We can show that v2(¢, w) = 
0 for any w and v2. Assume for contradiction that v2(¢,w)=y >0 but v1(¢, w) = 
1. Then, v*(¢)=(1—y, 1-1) =(1—y, 0). But since y>0, v*(¢) A (1, 0). 


Proposition 2. 


1. Let @ be a formula over {0,^, V, —, 0,0}. Then, § Eon $ if F Ervic $ 
and § Fea: > if F Exnic ¢, for any §. 
2. Let ọ € biloo. Then, § Ernis ¢ iff 5 Exe2 >, for any crisp §. 


Proof. 1. follows directly from the semantic conditions of Definition 3. We con- 
sider 2. The ‘only if’ direction is straightforward since the semantic conditions 
of vı in KG? models and v in KbiG models coincide. The ‘if’ direction follows 
from Proposition 1: if ¢ is valid on ¥, then v(¢, w) = 1 for any w € F and any v 
on §. But then, v1(¢,w) = 1 for any w € §. Hence, § Hre ¢. 


3  Model-Theoretic Properties of KG? 


In the previous section, we have seen how the addition of < allowed us to formalise 
statements considering comparison of beliefs. Here, we will show that both O 
and © fragments of KbiG, and hence KG?, are strictly more expressive than the 
classical modal logic K, i.e. that they can define all classically definable classes 
of crisp frames as well as some undefinable ones. 
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Definition 5 (Frame definability). Let X be a set of formulas. X defines 
a class of frames K in a logic L iff it holds that § E€ K iff ¥ Ex X. 


The next statement follows from Proposition 2 since K can be faithfully embed- 
ded in 6° by substituting each variable p with ~~p (cf. [28,29] for details). 


Theorem 1. Let K be a class of frames definable in K. Then, K is definable in 
KbiG and KG?. 


Theorem 2. 1. Let § be crisp. Then § is finitely branching (i.e., R(w) is finite 
for every w € $) iff ¥ Expic 1< O((p <q) Aq). 

2. Let § be fuzzy. Then § is finitely branching and sup{wRu’' : wRw’ <1} <1 
for all w € § iff § Ervic 1 < O((p<qg) AQ). 


Proof. We show the case of fuzzy frames since the crisp ones can be tackled 
in the same manner. Assume that % is finitely branching and that sup{wRw’: 
wRw' <1} <1 for all w € §. It suffices to show that v(O((p <q) Ag), w) < 1 for 
all w € §. First of all, observe that there is no w’ € ¥ s.t. v((p<q) Aq,w’) = 1. 
It is clear that sup {v((p<q)Aq,w’) Ag wRu’} < 1 and that 

wRu'<1 


sup{u((p<q) Ag, w): wRw' = 1} = max{v((p <q) ^q, w): wRu! =1} <1 


since R(w) is finite. But then v(O((p <q) Aq), w) < 1 as required. 

For the converse, either (1) R(w) is infinite for some w, or (2) sup{wRu’ : 
wRw’ <1} = 1 for some w. For (1), set v(p, w’) = 1 for every w’ € R(w). Now 
let W' C R(w) and W' = {w; : i € {1,2,...}}. We set v(q, wi) = z4. It is easy 
to see that sup{vu(q, w;) : w; E€ W’} = 1 and that v((p <q) Ag, wi) = v(q, wi). 
Therefore, v(1 < O((p<q) Aq), w) = 0. 

For (2), we let v(p, w’) = 1 and further, v(q, w’) = wRw' for all w’ € ¥. Now 
since sup{wRu’ : wRw’ < 1} = 1 and vo(((p<q)Aq), w’) = v(q, w’) for all w € F, 
it follows that v(O((p <q) ^q), w) = 1, whence v(1 < O((p <q) Ag), w) = 0. 


Remark 3. The obvious corollary of Theorem 2 is the lack of FMP for the Q- 
fragment of KbiG" since ((p < q) A q) in never true in a finite model. This 
differentiates KbiGf from 68 since the -fragment of 6A has FMP [12, Theo- 
rem 7.1]. Moreover, one can define finitely branching frames in O fragments of 
GK and GR“. Indeed, ~~O(p V ~p) serves as such definition. 


Corollary 1. KG? and both and © fragments of KbiG are strictly more 
expressive than K. 


Proof. From Theorems 1 and 2 since K is complete both w.r.t. all frames and 
all finitely branching frames. The result for KG? follows since it is conservative 
over KbiG (Proposition 2). 


5 Bi-modal KbiG lacks have FMP since it is a conservative extension of 6A. 
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These results show us that addition of < greatly enhances the expressive power 
of our logic. Here it is instructive to remind ourselves that classical epistemic 
logics are usually complete w.r.t. finitely branching frames (cf. [18] for details). 
It is reasonable since for practical reasoning, agents cannot consider infinitely 
many alternatives. In our case, however, if we wish to use KbiG and KG? for 
knowledge representation, we need to impose finite branching explicitly. 

Furthermore, allowing for infinitely branching frames in KbiG or KG? leads to 
counter-intuitive consequences. In particular, it is possible that v(O¢, w) = (0, 1) 
even though there are no w’,w” E€ R(w) s.t. vi(¢, w’) = 0 or vo(d,w”) = 1. In 
other words, there is no source that decisively falsifies ¢, furthermore, all sources 
have some evidence for ¢, and yet we somehow believe that ¢@ is completely 
false and untrue. Dually, it is possible that v(Q¢é,w) = (1,0) although there 
are no w’,w” E€ R(w) s.t. v1(¢,w’) = 1 or va(ġ, w”) = 0. Even though > is an 
‘optimistic’ aggregation, it should not ignore the fact that all sources have some 
evidence against o but none supports it completely. 

Of course, this situation is impossible if we consider only finitely branching 
frames for infima and suprema will become minima and maxima. There, all 
values of modal formulas will be witnessed by some accessible states in the 
following sense. For Y € {0,0}, i € {1,2}, if u,(0d,w) = x, then there is 
w E€ R(w) s.t. u;(d,w’) = x. Intuitively speaking, finitely branching frames 
represent the situation when our degree of certainty in some statement is based 
uniquely on the data given by the sources. 


Convention 3. We will further use KbiGm and KG? to denote the sets of all 
biLy,o and bilg » formulas valid on finitely branching crisp frames. 


Observe, moreover, that O and ¢ are still undefinable via one another in bi£, 9. 
The proof is the same as that of [36, Lemma 6.1]. 


Proposition 3. O and © are not interdefinable in KbiGp. 


Corollary 2. 


1. O and Q are not interdefinable in KbiG, KbiGi,, and KbiG'. 
2. Both O and > fragments of KbiG are more expressive than K. 


In the remainder of the paper, we are going to provide a complete proof system 
for KG#, (and hence, KbiGm), and establish its decidability and complexity as 
well as finite model property. Note, however, that the latter is not entirely for 
granted. In fact, several expected ways of defining filtration (cf. [7,14] for more 
details thereon) fail. 

Let X C bilo, be closed under subformulas. If we want to have filtration 
for KbiGm, there are three intuitive ways to define ~s on the carrier of a model 
that is supposed to relate states satisfying the same formulas. 


1. w ~} w iff v(¢,w) = v(¢, wv’) for all gE X. 
2.w~} w iff v(d,w) =1 4 v(ġ, w) = 1 for all dE X. 
3. w~} w iff v(¢,w) < o(¢',w) > vlo, w) < o(¢', w) for all ¢, 6’ € YU{0,1}. 


Paraconsistent Gödel Modal Logic 439 


Consider the model on Fig. 3 and two formulas: 
$= = ~~(p > Op) p7 = ~~(p< Op) 


Now let X to be the set of all subformulas of ¢< A $7. 

First of all, it is clear that v(¢S A ¢7,w) = 1 for any w € M. Observe now 
that all states in IN are distinct w.r.t. ~}. Thus, the first way of constructing 
the carrier of the new model does not give the FMP. 


M: wy >W2 =... >=Wn >... 


Fig. 3. v(p, wn) = = 

As regards to ~% and ~%, one can check that for any w, w’ € M, it holds that 
w ~} w and w ~} w. So, if we construct a filtration of M using equivalence 
classes of either of these two relations, the carrier of the resulting model is going 
to be finite. Even more so, it is going to be a singleton. 

However, we can show that there is no finite model N = (U, S, e) s.t. 


YsEN:v(pE Ag?,s) =1 


Indeed, e(¢S,t) = 1 iff e(p,t’) > 0 for some t’ € S(t), while e(7,t) = 1 iff 
v(p,t) > v(p,t’) for any t € S(t). Now, if U is finite, we have two options: either 
(1) there is u € U s.t. R(u) = Ø, or (2) U contains a finite S-cycle. 

For (1), note that v(Qp,u) = 0, and we have two options: if e(p, u) = 0, then 
e(¢,u) = 0; if, on the other hand, e(p,u) > 0, then e(@S,u) = 0. For (2), 
assume w.l.o.g. that the S-cycle looks as follows: ugSu; Sug... Sun Suo. 

If e(p, up) =0, e(@7, ug) =0, so e(p, ug) >0. Furthermore, e(p, u;) > e(p, ui+1). 
Otherwise, again, e(¢7, ui) = 0. But then we have e(¢@7, u;i) = 0. 

But this means that ~} and ~% do not preserve truth of formulas from w 
to [w]y, i.e., neither of these two relations can be used to define filtration. Thus, 
in order to explicitly prove the finite model property and establish complexity 
evaluations for KbiGm and KG#,, we will provide a tableaux calculus. It will also 
serve as a decision procedure for satisfiability and validity of formulas. 


4 Tableaux for KG? 


Usually, proof theory for modal and many-valued logics is presented in one of the 
following several forms. The first one is a Hilbert-style axiomatisation as given in 
e.g. [23] for the propositional Gédel logic and in [12,13,36] for its modal expan- 
sions. Hilbert calculi are useful for establishing frame correspondence results as 
well as for showing that one logic extends another one in the same language. On 
the other hand, their completeness proofs might be quite complicated, and the 
proof-search not at all straightforward. Second, there are non-labelled sequent 
and hyper-sequent calculi (cf. [30] for the propositional proof systems and [28, 29] 
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for the modal hypersequent calculi). With regards to modal logics, completeness 
proofs of (hyper)sequent calculi often provide the answer for the decidability 
problem. Furthermore, the proof search can be quite straightforwardly automa- 
tised provided that the calculus is cut-free. 

Finally, there are proof systems that directly incorporate semantics: in par- 
ticular, tableaux (e.g., the ones for Gödel logics [2] and tableaux for Lukasiewicz 
description logic [25]) and labelled sequent calculi (cf., e.g. [32] for labelled 
sequent calculi for classical modal logics). Because of the calculi’s nature, their 
completeness proofs are usually simple. Besides, the calculi serve as a decision 
procedure that either establishes that the given formula is valid or provides an 
explicit countermodel. 

Our tableaux system 7 (KGj,) is a straightforward modal expansion of con- 
straint tableaux for G? presented in [5]. It is inspired by constraint tableaux 
for Lukasiewicz logics from [21,22] (but cf. [26] for an approach similar to ours) 
which we modify with two-sorted labels corresponding to the support of truth 
and support of falsity in the model. This idea comes from tableaux for the 
Belnap—Dunn logic by D’Agostino [1]. Moreover, since KG#, is a conservative 
extension of KbiGm, our calculus can be used for that logic as well if we apply 
only the rules that govern the support of truth of bi£, 9 formulas. 


Definition 6 (T(KG},)). We fix a set of state-labels W and let SE {<,<} and 
ZE{>, >}. Let further wEW, xe {1,2}, dEbiLg 4, and ce {0,1}. A structure 
is either w:x:@ or c. We denote the set of structures with Str. 

We define a constraint tableau as a downward branching tree whose branches 
are sets containing the following types of entries: 


— relational constraints of the form wRw’ with w, w’ € W; 
— structural constraints of the form X < X’ with X, X’ € Str. 


Each branch can be extended by an application of a rule® from Fig. 4 or Fig. 5. 
A tableau’s branch B is closed iff one of the following conditions applies: 


— the transitive closure of B under S contains X < X; 
-OS1EB, or X>1€B, or X<0€B. 


A tableau is closed iff all its branches are closed. We say that there is a tableau 
proof of ¢ iff there is a closed tableau starting from the constraint w:1:¢ < 1. 
An open branch B is complete iff the following condition is met. 


T 


* Tf all premises of a rule occur on B, then its one conclusion’ occurs on B. 


Remark 4. Note that due to Proposition 1, we need to check only one valuation 
of ¢ to verify its validity. 


Convention 4 (Interpretation of constraints). The following table gives 
the interpretations of structural constraints on the example of <. 


6 If ¥ <land X < X’ (or 0 < X’ and & < X’) occur on B, then the rules are applied 
only to X < ¥'. 
T Note that branching rules have two conclusions. 
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wil:ndSX e Wid: -SX 5 Wil mW@Z*X s Wid: -2X 


7 wi2: dS Ar wil: dSX ee wil: pZX RT wil:od2X 


Ie > .9. 1< EE 1< sk 1> 
po LIDZE , <Wi2OPSE , wil (VISE). widiovd ZE 
wil:@2X wi2:dSX wil:dSX wi2:d2X 
wil:d’ ZÆ wi2:d’ SX wil:d' SX wi2:d' ZÆ 
1: I< 9. 1> 
Ne wil:dAd’ LX ies w:2:Q AP ZX 
w:l:dSX | wil: LÆ w:2:QZ E| w:2:09 2X 


nz wil:dV db ZE wee wi2:idoVP SX 
‘\wil:d=X]wil:d/ Sk PS wid: OSE | wi2:G S¥ 


wil:dpw’<cXk wil:dw P' ZÆ 
>ı[ =Z 
<1 w:il:d<w:l:¢ |w:l:g 2X 
X21) wl: <x 
wil:@>w:l:d 


wi2idodsx wi2:do P' ZE 
+» >> 
w:2:0' <w:2:6|w:2:¢ SX X>0 
€<0| w:2:¢'>X 
w:2:¢'>w:2:¢ 
wil:o<d' SX w:l:o<¢' >X 
<1S x12 
~wil:d<w:il:d | wil: dS X>0 
*<0| w:l:d>% 
wil:¢d>w:l:d 
wi2:d<x<d 2X wi2:ox«P<cX 
<22 <2 S 
~w:2:o2¥ | w:2:6 <w:2:¢ X<1 
X>1 wi2:6<X& 
w:2:¢'>w:2:¢ 


w:1:9 > 9 <£ R w:2:ġ > 9P >X 
wil:d <¥€ wi2:d' >X 
w:l:d>w:l1:¢' wi2:¢'>w:2:¢ 


>1< 


wil:d<d'>X P e Witb<P<k 
w:l:d>X ‘ wi2:d<X 
wil:d>w:l1:¢' wi2:d<wi2:d! 


<i 


Fig. 4. Propositional rules of T (KG). Bars denote branching. 
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entry | interpretation 
wil:d<w':2:¢ v1(¢, w) < v2(¢", w’) 
w:i2:d6<c — |ve(d,w) < c with c € {0,1} 


As one can see from Fig. 4 and Fig. 5, the rules follow the semantical conditions 
from Definition 4. Let us discuss >;< and 0; S in more details. 

The premise of —;< is interpreted as v1(¢ — ¢’,w) < x. To decompose the 
implication, we check two options: either x = 1 (then, the value of ¢ —> ¢’ is 
arbitrary) or x < 1. In the second case, we use the semantics to obtain that 


v1(¢',w) < x and v1(¢,w) > v1 (¢, w). 


w:1:0¢ =X w:2:0¢6 SX 
/ EEEa 9-44 > / 
= wRw ae osk oe oZ X < wRw 
X wil:d2X ~ — wRw" ~ wRw" ~ wl: 2:65 2% 
w”:1:ġ SX w':2:¢6 2% 
w:1:9ġ S X w:2:o¢2% 
wRu’ w:1:0¢2% w:2:0¢ Z Æ wRw' 
Do ae Ni 7 02S 7 Vi — 
wi:l:dSX wRw wRw w':2:¢62% 
w':1:d Z X w”:2:ġ L X 


Fig. 5. Modal rules of 7 (KGj,). w” is fresh on the branch. 


In order to apply O1 S to w:1:0¢ < &, we introduce a new state w” that is 
seen by w. Since we work in a finite branching model, w” can witness the value 
of O¢. Thus, we add w”:1:¢ġ < &. 

We also provide an example of how our tableaux work. On Fig. 6, one can 
see a successful proof on the left and a failed proof on the right. 


wo:1:1«O((p<q)Aq) <1 wo:1:0p > p< 
N wo:1:00p<1 
wo:1:1 < wo:1:O((p<qg)Aq) wo:1:1<1 wo:l: ne p 
wo:1:O((p<q)Aq) 21 se woRwr 
woRwi wo:1:0p > wi:1:Op 
wi:1:(p<q)Aq>1 wi:il:p > wi:1:O0p 
wı:l:pxq 21 wı Rw2 
Z N wi:l:p > w2:1:p 
© 
1<0 wi:l:¢q21 
x wi:l:ipè 1 
wi:l:q< 1 


Fig. 6. x indicates closed branches; © indicates complete open branches. 
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Definition 7 (Branch realisation). We say that a model M = (W, R, v1, v2) 
with W = {w : w occurs on B} and R = {(w,w’) : wRw’ € B} realises a branch 
B of a tree iff the following conditions are met. 


- Ux(¢,w) < vx: (¢',w’) for anyw:x:¢<w':x':¢' €B with x,x’ € {1,2}. 
- vxl, w) < c for anyw:x:6<cé€B with ce {0,1}. 


Theorem 3 (Completeness). ¢ is KG#, valid iff it has a T(KG3) proof. 


Proof. We consider only the KG? case since KbiGg, can be handled the same 
way. For soundness, we check that if the premise of the rule is realised, then so is 
at least one of its conclusions. We consider the cases of >;< and O1 S. Assume 
that w:1:6—- ¢@ < ¥ is realised and assume w.l.o.g. that X = u:2:w. It is 
clear that either v2(q, u) = 1 or ve(~,u) < 1. In the first case, ¥ > 1 is realised. 
In the second case, we have that v1(¢,w) > v1(¢', w) and v1(¢’, w) < volp, u). 
Thus, ¥ <1, w:1:¢>w:1:¢, and w:1:¢ < u:1:¥y are realised as well, as 
required. 

For 01S, assume that w:1: O% <Œ is realised and assume w.l.o.g. that X = 
u:2:v. Thus, vi(Od, w) < ve(w,u) Then, since the model is finitely branching, 
there is an accessible state w” s.t. vi(¢,w) < vo(~,u). Thus, w”:1: <Æ is 
realised too. 

As no closed branch is realisable, the result follows. 

For completeness, we show that every complete open branch 8 is realisable. 
We construct the model as follows. We let W = {w : w occurs in B}, and set 
R= {(w,w’) : wRw’ € B}. Now, it remains to construct the suitable valuations. 

For i € {1,2}, if w:i:p > 1 € B, we set vi(p,w) = 1. If w:i:p < O€ B, 
we set v;(p,w) = 0. To set the values of the remaining variables q1, ..., dn, we 
proceed as follows. Denote Bt the transitive closure of B under < and let 


or 
w:x:qi > w':x':q; E BY and w:x:q >w’':x':q; ¢ BY 


exau 


w:x:qi <w':x':q; € BY and w:x:qi <w :x':q; ¢ os 


It is clear that there are at most 2-n-|W| [w:x:qJ]’s since the only possible 
loop in Bt is Wa, :X:r S... < wi, : xX: 7, but in such a loop all elements belong 
to [wi :x:r]. We put [w:x:q] < [w’:x’:q,| iff there are w,:x:r € [w:x:q] and 
w:x’ ir’ € [w:x :q;] s.t. wk:x:r < w:x ir” € Bt. 

We now set the valuation of these variables as follows 


olau = LRK: | eu! d] < [essere 
, 2-n-|W| 

Note that if some ¢ contains s but B* contains no inequality with it, the above 
definition ensures that s is going to be evaluated at 0. Thus, all constraints 
containing only variables are satisfied. 

It remains to show that all other constraints are satisfied. For that, we prove 
that if at least one conclusion of the rule is satisfied, then so is the premise. The 
propositional cases are straightforward and can be tackled in the same manner 
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as in [5, Theorem 2]. We consider only the case of O2 Z. Assume w.l.o.g. that 
2=> and ¥ = u:1:yw. Since B is complete, if w: 2:0¢ > u:1:wW € B, then 
for any w’ s.t. wRw’ € B, we have w’: 2:¢ > u:1:y € B, and all of them are 
realised by Mt. But then w: 2:0¢ > u:1:¥ is realised too, as required. 


Theorem 4. 


1. Let ọ € bilg » be not KG, valid, and let |$| denote the number of symbols in 
it. Then there is a model M of the size O(\d|'*!) and depth O(|@|) and w € M 


s.t. vilo, w) #1. 
2. KG}, validity and satisfiability® are PSPACE-complete. 


Proof. We begin with 1. By Theorem 3, if ¢ is not KG? valid, we can build 
a falsifying model using tableaux. It is also clear from the rules on Fig. 5 that 
the depth of the constructed model is bounded from above by the maximal 
number of nested modalities in ¢. The width of the model is bounded by the 
maximal number of modalities on the same level of nesting. The sharpness of the 
bound is obtained using the embedding of K into KG#, since K is complete w.r.t. 
finitely branching models and it is possible to force shallow trees of exponential 
size in K (cf., e.g. [7, §6.7]). The embedding also entails PSPACE-hardness. It 
remains to tackle membership. 

First, observe from the proof of Theorem 3 that ¢(p1,...,Pn) is satisfiable 
(falsifiable) on M = (W, R, v1, v2) iff there are vı and v2 that give variables values 


from V = fo, In WI ER att ; 1} under which ¢ is satisfied (falsified). 


As we mentioned, |W| is bounded from above by k**1 with k being the 
number of modalities in ¢. Therefore, we replace structural constraints with 
labelled formulas of the form w:i:¢=v (v € V) avoiding comparisons of values 
of formulas in different states. As expected, we close the branch if it contains 
wii:w=v and w:i:p=v' for v £v. 

Now we replace the rules with the new ones that work with labelled formulas 
instead of structural constraints. Below, we give as an example new rules for —> 
and Q? (with |V| = m + 1): 


w:1:ġ — ¢'=1 
w:1:ġ¢=— |w:1:¢4= — w:1l:¢= 2 
w:l:¢=0 mj! mgt Pa | wil:g=1 
Q her acu w g=, 1:¢'= A Q 
w:1:0¢= -4 w:l:0¢= zr; WRW' 
wRw”; w” :1:9= aay w:1:ġ=0]|... | w:1:¢= 3 


8 Satisfiability and falsifiability (non-validity) are reducible to each other using <: ¢ 
is satisfiable iff ~~(¢ < 0) is falsifiable; ¢ is falsifiable iff ~~(1 < @) is satisfiable. 

° Intuitively, for a value 1 > v > 0 of O¢ at w, we add a new state that witnesses v, 
and for a state on the branch, we guess a value smaller than v. Other modal rules 
can be rewritten similarly. 
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We now show how to build a satisfying model for ¢ using polynomial space. 
We begin with wo:1: @=1 and start applying propositional rules (first, those 
that do not require branching). If we implement a branching rule, we pick one 
branch and work only with it: either until the branch is closed, in which case 
we pick another one; until no more rules are applicable (then, the model is 
constructed); or until we need to apply a modal rule to proceed. At this stage, 
we need to store only the subformulas of ġ with labels denoting their value at wo. 

Now we guess a modal formula (say, wo:2: OV = met) whose decomposition 
requires an introduction of a new state (w1) and apply this rule. Then we apply 
all modal rules that use woRw, as a premise (again, if those require branching, 
we guess only one branch) and start from the beginning with the propositional 
rules. If we reach a contradiction, the branch is closed. Again, the only new 
entries to store are subformulas of ¢ (now, with fewer modalities), their values 
at w1, and a relational term woRwy. Since the depth of the model is O(|@|) and 
since we work with modal formulas one by one, we need to store subformulas of 
ġ with their values O(|¢|) times, so, we need only O(|¢|?) space. 

Finally, if no rule is applicable and there is no contradiction, we mark wo: 
2:0% = ati as ‘safe’. Now we delete all entries of the tableau below it and 
pick another unmarked modal formula that requires an introduction of a new 
state. Dealing with these one by one allows us to construct the model branch by 
branch. But since the length of each branch of the model is bounded by O(|¢]) 
and since we delete branches of the model once they are shown to contain no 
contradictions, we need only polynomial space. 


We end the section with two simple observations. First, Theorems 3 and 4 
are applicable both to KbiG», and KGZ because the latter is conservative over 
the former. Secondly, since KG? and KbiG are conservative over R° and since 
K can be embedded in 68°, the lower bounds on complexity of a classical modal 
logic of some class of frames K and G? modal logic of K will coincide. 


5 Concluding Remarks 


In this paper, we developed a crisp modal expansion of the two-dimensional 
Gödel logic G? as well as an expansion of bi-Gédel logic with O and ® both for 
crisp and fuzzy frames. We also established their connections with modal Gödel 
logics, and gave a complexity analysis of their finitely branching fragments. 
The following steps are: to study the proof theory of KG? and KG}: both 
in the form of Hilbert-style and sequent calculi; establish the decidability (or 
lack thereof) for the case of KG?. Moreover, two-dimensional treatment of infor- 
mation invites for different modalities, e.g. those formalising aggregation strate- 
gies given in [6]—in particular, the cautious one (where the agent takes min- 
ima/infima of both positive and negative supports of a given statement) and 
the confident one (whereby the maxima/suprema are taken). Last but not least, 
while in this paper we assumed that our access to sources is crisp, one can argue 
that the degree of our bias towards the given source can be formalised via fuzzy 
frames. Thus, it would be instructive to construct a fuzzy version of KG?. 
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In a broader perspective, we plan to provide a general treatment of two- 
dimensional modal logics of uncertainty. Indeed, within our project [5,6], we are 
formalising reasoning with heterogeneous and possibly incomplete and inconsis- 
tent information (such as crisp or fuzzy data, personal beliefs, etc.) in a modular 
fashion. This modularity is required because different contexts should be treated 
with different logics—indeed, not only the information itself can be of various 
nature but the reasoning strategies of different agents even applied to the same 
data are not necessarily the same either. Thus, since we wish to account for this 
diversity, we should be able to combine different logics in our approach. 
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Abstract. Adding multi-modalities (called subexponentials) to linear logic 
enhances its power as a logical framework, which has been extensively used in 
the specification of e.g. proof systems, programming languages and bigraphs. Ini- 
tially, subexponentials allowed for classical, linear, affine or relevant behaviors. 
Recently, this framework was enhanced so to allow for commutativity as well. 
In this work, we close the cycle by considering associativity. We show that the 
resulting system (acLLs;) admits the (multi)cut rule, and we prove two undecid- 
ability results for fragments/variations of acLLy. 


1 Introduction 


Resource aware logics have been object of passionate study for quite some time now. 
The motivations for this passion vary: resource consciousness are adequate for mod- 
eling steps of computation; logics have interesting algebraic semantics; calculi have 
nice proof theoretic properties; multi-modalities allow for the specification of several 
behaviors; there are many interesting applications in linguistics, etc. 

With this variety of subjects, applications and views, it is not surprising that dif- 
ferent groups developed different systems based on different principles. For example, 
the Lambek calculus (L) [29] was introduced for mathematical modeling of natural lan- 
guage syntax, and it extends a basic categorial grammar [3,4] by a concatenation oper- 
ator. Linear logic (LL) [16], originally discovered by Girard from a semantical analysis 
of the models of polymorphic A-calculus, turned out to be a refinement of classical and 
intuitionistic logic, having the dualities of the former and constructive properties of the 
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latter. The key point is the presence of the modalities !, ?, called exponentials in LL. In 
the intuitionistic version of LL, denoted by ILL, only the ! exponential is present. 

L and LL were compared in [2], when Abrusci showed that Lambek cal- 
culus coincides with a variant of the non-commutative, multiplicative version of 
ILL [41]. This correspondence can be lifted for considering also the additive connec- 
tives: Full (multiplicative-additive) Lambek calculus FL relates to non-commutative 
multiplicative-additive version of ILL, here denoted by cLL. 

In this paper we propose the sequent based system acLL y, a conservative extension 
of cLL, where associativity is allowed only for formulas marked with a special kind of 
modality, determined by a subexponential signature X. The notation adopted is mod- 
ular, uniform and scalable, in the sense that many well known systems will appear as 
fragments or special cases of acLL x, by only modifying the signature X. The core frag- 
ment of acLLy, (i.e., without the subexponentials) corresponds to the non-associative 
version of full Lambek calculus, FNL [8].! 

The language of acLL s consists of a denumerable infinite set of propositional vari- 
ables {p,q,r,...}, the unities {1,7}, the binary connectives for additive conjunc- 
tion and disjunction {&, ®}, the non-commutative multiplicative conjunction ©, the 
non-commutative linear implications {—, —}, and the unary subexponentials IŻ with i 
belonging to a pre-ordered set of labels (I, <). 

Roughly speaking, subexponentials [13] are substructural multi-modalities. In LL, 
! A indicates that the linear formula A behaves classically, that is, it can be contracted 
and weakened. Labeling ! with indices allows moving one step further: The set J can 
be partitioned so that, in ŻA, A can be contracted and/or weakened. This allows for 
two other types of behavior (other than classical or linear): affine (only weakening) or 
relevant (only contraction). Pre-ordering the labels (together with an upward closeness 
requirement) guarantees cut-elimination [42]. But then, why consider only weakening 
and contraction? Why not also take into account other structural properties, like com- 
mutativity or associativity? In [20,21] commutativity was added to the picture, so that 
in "A, A can be contracted, weakened, classical or linear, but it may also commute with 
the neighbor formula. In this work we consider the last missing part: Associativity. 

Smoothly extending cLL to allow consideration of the non-associative case is 
non trivial. This requires a structural recasting/reframing of sequents: we pass from 
sets/multisets to lists in the non-commutative case, onto trees in the case of non- 
associativity [28]. As a consequence, the inference rules should act deeply over formu- 
las in tree-structured sequents, which can be tricky in the presence of modalities [17]. 

On the other side, the multi-modal Lambek calculus introduced in [35,45] and 
extended/compiled/implemented in [18,36-38]* use different families of connectives 
and contexts, distinguished by means of indices, or modes. Contexts are indexed binary 
trees, with formulas built from the indexed adjoint connectives {—>;, —;} and &; (e.g. 


' The multiplicative fragment of acLL x is the non-associative version of Lambek’s calculus, NL, 
introduced by Lambek himself in [30]. Both the associative calculus L and the non-associative 
calculus NL have their advantages and disadvantages for the analysis of natural language syn- 
tax, as we discuss in more detail in Sect. 2.2. 

> The Grail family of theorem provers [37] works with a variety of modern type-logical frame- 
works, including multimodal type-logical grammars. 
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(A >i B,(C ®; D,H)*)'). Each mode has its own set of logical rules (following 
the same rule scheme), and different structural features can be combined via the mode 
information on the formulas. This gives to the resulting system a multi-modal flavor, 
but it also results in a language of binary connectives, determined by the modes. This 
forces an unfortunate second level synchronization between implications and tensor, 
and modalities act over whole sequents, not on single formulas. 

In order to attribute particular resource management properties to individual 
resources, in [27,33] explicit (classical) multi-modalities ©;, O0; were proposed. While 
such unary modalities were inspired in LL exponentials, the resemblance stops there. 
First of all, the logical connectives come together with structural constructors for con- 
texts, which turns ©;, O; into truncated forms of product and implication. 

Second, ©;, O; have a temporal behavior, in the sense that OOF => F and F => 
©F, which are not provable in LL using the “natural interpretation” © = ?, O = !. 

In this paper, multi-modality is totally local, given by the subexparientiats: The 
signature X contains the pre-ordered set of labels, together with a function stating which 
axioms, among weakening, contraction, exchange and associativity, are assumed for 
each label. Sequents will have a nested structure, corresponding to trees of formulas. 
And rules will be applied deeply in such structures. This not only gives the LL based 
system a more modern presentation (based on nested systems, like e.g. in [10,15]), 
but it also brings the notation closer to the one adopted by the Lambek community, 
like in [25]. Finally, it also uniformly extends several LL based systems present in the 
literature, as Example 8 in the next section shows. 

Designing a good system serves more than simple pure proof-theoretic interests: 
Well behaved, neat proof systems can be used in order to approach several impor- 
tant problems, such as interpolation, complexity and decidability. And decidability of 
extensions/variants/fragments of L and LL is a fascinating subject of study, since the 
presence or absence of substructural properties/connectives may completely change the 
outcome. Indeed, it is well known that LL is undecidable [32], but adding weakening 
(affine LL) turns the system decidable [24], while removing the additives (MELL — 
multiplicative, exponential LL) reaches the border of knowledge: It is a long standing 
open problem [50]. Non-associativity also alters decidability and complexity: L is NP- 
complete [47], while NL is decidable in polynomial time [1,6]. Finally, the number of 
subexponentials also plays a role in decision problems: MELL with two subexponentials 
is undecidable [9]. 

In this work, we will present two undecidability results, all orbiting (but not encom- 
passing) MELL/FNL. First, we show that acLL s containing the multiplicatives @, >, 
the additive @ and one classical subexponential (allowing contraction and weakening) 
is undecidable. This is a refinement of the unpublished result by Tanaka [51], which 
states that FNL plus one fully-powered subexponential is undecidable. 

In the second undecidability result, we keep two subexponentials, but with a min- 
imalist configuration: the implicational fragment of the logic plus two subexponen- 
tials: the “main” one allowing for contraction, exchange, and associativity (weakening 
is optional), and an “auxiliary” one allowing only associativity. This is a variation of 
Chaudhuri’s result (in the non-associative, non-commutative case), making use of fewer 
connectives (tensor is not needed) and less powerful subexponentials. 
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Table 1. Acronyms/decidability of systems mentioned in the paper. 


Acronym | System Decidable? 
L Lambek calculus y 

LL (propositional) linear logic x 

ILL intuitionistic LL x 
MALL multiplicative-additive LL v 
iMALL _ | intuitionistic MALL y 

FL full (multiplicative-additive) L Jv 
cLL non-commutative IMALL y 
acLLs non-commutative, non-associative ILL with subexponentials - 

NL non-associative L y 
FNL full (multiplicative-additive) NL y 
MELL multiplicative-exponential LL unknown 
SDML simply dependent multimodal linear logics — 
SMALCs |FL with subexponentials = 


The rest of the paper is organized as follows: Sect. 2 presents the system acLL», 
showing that it has the cut-elimination property and presenting an example in linguis- 
tics; Sect. 3 shows the undecidability results; and Sect. 4 concludes the paper. 

We have placed, in Table 1, the acronyms for and decidability of all considered 
systems. Decidability for the cases marked with “—” depends on the signature X. 


2 A Nested System for Non-associativity 


Similar to modal connectives, the exponential ! in ILL is not canonical [13], in the sense 
that if i Æ j then !’F Æ |) F. Intuitively, this means that we can mark the exponential 
with labels taken from a set J organized in a pre-order < (i.e., reflexive and transitive), 
obtaining (possibly infinitely-many) exponentials ( for i € T). Also as in multi-modal 
systems, the pre-order determines the provability relation: for a general formula F’, PF 
implies \° F iff a < b. 

The algebraic structure of subexponentials, combined with their intrinsic structural 
property allow for the proposal of rich linear logic based frameworks. This opened a 
venue for proposing different multi-modal substructural logical systems, that encoun- 
tered a number of different applications. Originally [42], subexponentials could assume 
only weakening and contraction axioms: 


C: 'Po!FerFr W: Fol 


This allows the specification of systems with multiple contexts, which may be repre- 
sented by sets or multisets of formulas [44], as well as the specification and verification 
of concurrent systems [43], and biological systems [46]. In [20,21], non-commutative 
systems allowing commutative subexponentials were presented: 


E: ('F)@G=Ge("F) 
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and this has many applications, e.g., in linguistics [21]. 
In this work, we will present a non-commutative, non-associative linear logic based 
system, and add the possibility of assuming associativity? 


Al: "FQ(G@H)=("FEG@H A2: (G@H)8'"F=GR(HO"F) 


as well as commutativity and other structural properties. 

We start by presenting an adaption of simply dependent multimodal linear logics 
(SDML) appearing in [31] to the non-associative/commutative case. 

The language of non-commutative SDML is that of (propositional intuitionistic) 
linear logic with subexponentials [21] supplied with the left residual; or similarly, that 
of FL with subexponentials. Non-associative contexts will be organized via binary trees, 
here called structures. 


Definition 1 (Structured sequents). Structures are formulas or pairs containing 
structures: 
[I,A:=F | (L,I) 
where the constructors may be empty but never a singleton. 
An n-ary context I" f \. : L} is a context that contains n pairwise distinct num- 
bered holes { } wherever a formula may otherwise occur. Given n contexts I\,..., Tn, 
we write {I }--- {In} for the context where the k-th hole in rf \. i fY has been 


replaced by Ik (for 1 < k < n). If I, = Ø the hole is removed. 
A structured sequent (or simply sequent) has the form I = F where T is a structure 
and F is a formula. 


Example 2. Structures are binary trees, with formulas as leaves and commas as nodes. 
The structure !A, (B,C) represents the tree below left, while (! A, B), C represents 
the tree below right 


as he Su 
Z/N 7N\ 
B © A B 


"A 


Definition 3 (SDML). Let A be a set of axioms. A (non-associative/commutative) sim- 
ply dependent multimodal logical system (SDML) is given by a triple X = (I, =, f), 
where I is a set of indices, (I, <) is a pre-order, and f is a mapping from I to 2°. 

If X is a SDML, then the logic described by X has the modality "' for every i € I, 
with the rules of FNL depicted in Fig. 1, together with rules for the axioms f(i) and 
the interaction axioms A — "A for every i,j € I with i x j. Finally, every SDML 
is assumed to be upwardly closed w.rt. <, that is, if i < j then f(t) C f(j) for all 
jed 


3 Note that the implemented rules in Fig. 2 reflect the left to right direction of such axioms only. 
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Figure 2 presents the structured system acLL ys, for the logic described by the SDML 
determined by X, with A = {C,W,A1, A2, E} where, in the subexponential rule for 
S € A, the respective s € I is such that S € f(s) (e.g. the subexponential symbol e 
indicates that E € f(e)). We will denote by !**A the fact that the structure A contains 
only banged formulas as leaves, each of them assuming the axiom Ax. 

As an economic notation, we will write | i for the upset of the index i, i.e., the set 
{j E€ I: i x j}. We extend this notation to structures in the following way. Let T 
be a structure containing only banged formulas as leaves. If such formulas admit the 
multiset partition 


{VF eLD:ixj}U{"Fer:idkandWeE f(k)} 


then T is the structure obtained from I" by easing the formulas in the second com- 
ponent of the partition (equivalently, the substructure of I’ formed with all and only 
formulas of the first component of the partition). Otherwise, I" is undefined. 


Example 4. Let I = (A, (1B, '*C)) be represented below left, i < j but i £ k, and 
W € f(k). Then T” = (!'A,!’ B) is depicted below right 


Z N 
ZN Z N 
B IC ŻA B 


ŻA 


Observe that, if W ¢ f(k), then F cannot be built. In this case, any derivation of 
T = !{A & B) cannot start with an application of the promotion rule !’ R (similarly to 
how promotion in ILL cannot be applied in the presence of non-classical contexts). In 
this case, if A, B are atomic, this sequent would not be provable. 


Example 5. The use of subexponentials to deal with associativity can be illustrated by 
the prefixing sequent A — B => (C — A) — (C —> B): It is not provable for an 
arbitrary formula C', but if C = !“C’, then 


rosa "t “(a A> B)> B 
(207, (CC = A), (A> B)) > B 
(C (EC > A), (A > B))) > B 
(C > A), (A> B) > O B 

A> B= (O > A) > (CO > B) 


Al 
>R 
>R 


2.1 Cut-Elimination 


When it comes to the proof of cut-elimination for acLLs, the cut reductions for the 
propositional connectives follow the standard steps for similar systems such as, e.g., 
Moot and Retoré’s system NLO in [38, Chapter 5.2.2]. The case of structural rules, on 
the other hand, should be treated with care. 
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PROPOSITIONAL RULES 


T{(F,G)} > H D>F LG T{F}>H_T{®Q >H | 
T{iF@G)>H m,r) Fac TiF@eG)>H i 
DSF aR T{F} >G . >F [>G aR 
TSF omn“ TF &rFj>sG °* T>F&G 
A=>F TIGH (FT) >G A>F TIGH 
Tidé,F3oG}S>H ” rsarae TIG-FA)>H < 
(U,F)>G I{}>F 
TGF“ Tus 4 si" Far ™ 


INITIAL AND CUT RULES 


Fig. 1. Structured system FNL for non-associative, full Lambek calculus. 


SUBEXPONENTIAL RULES 


ISE , T{F} >G 
Por ` r{'F} >G 
STRUCTURAL RULES 
T{((!°41, 42), 43)} > G Tr{(41, (42, !“43))} => G r{(42, 4) > G EI 
T{(!41, (42,43))} = G I{((4,, 42), 43) => G T{(4, 4) = G 
F(A, 4)} > G r0>G yy rf ra.. fra = G F 
T'{(A1, °A2)} => G Tr{”4} =G Pp) cf Yah cf boee 


Fig. 2. Structured system acLL x for the logic described by X. 


Theorem 6. Zf the sequent I = F is provable in acLLy, then it has a proof with no 
instances of the rule mcut. 


Proof. The most representative cases of cut reductions involving subexponentials are 
detailed next. In order to simplify the notation, when possible, the mcut rule is presented 
in its simple form, with an 1-ary context. 


Case !“: Suppose that 


TY Ta 
APF ap TICF, 42), 4) >G ~ 
AS F E TCF (Ay, A3)}>G , 
mcu 
T4{(4;, (Ao, A3))} > G 


Since axioms are upwardly closed w.r.t. <, it must be the case that A con- 
tains only formulas marked with subexponentials allowing associativity. All 
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the other formulas in A; can be weakened; this is guaranteed by the applica- 
tion of the rule !“R in mı. Hence the derivation above reduces to 


T1 
AVF 
—— |4 T2 
AN > !°F I'{(!°F, Az), A3)} > G 


mcut 


r{((Al’, 42), As)} >G 


piar, (Ae, As))} >G n 


T{(41, (Ag, 43) } > G 


W 
Case ! Suppose that 


Aint : DUF}... {F}... {°F} > G 
ASF! Th}... {°F}... 1} >G 
Psi) tye 


Since A1° contains only formulas marked with subexponentials allowing con- 
traction, the derivation above reduces to 


C 


mcut 


Ti 
At >F eR T2 
At s IF T{! F}... {F}... {LF} > G 
T{ Afe} ... {41}... {A40} >G 
T{}...{14}...{} >G 
I{}...{A}...{} >G 
Observe that here, as usual, the multicut rule is needed in order to reduce the 


cut complexity. 
Case !’ R: Suppose that 


mcut 


T2 
TY r 1 
Aver, CED" SG p 
A= F’ T{ŻF} Ges 
mcut 


{A} > "G 


If j £ i, then it should be the case that W € f(i) and (r{F})” = 
T{ }", since !’ F will be weakened in the application of rule !/ R. Hence, all 
formulas in A can be weakened as well and the reduction is 


T2 
yi> G 
a R 
T{} => "G 
{A} > "G 
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On the other hand, if j < 4, by transitivity all the formulas in A" also have 
this property (implying that A™ is a substructure of A'’), and the rest of 
formulas of A can be weakened. Hence the derivation above reduces to 


Ti 
Als F ÚR n2 R 
Ajstro™ (TEF sG 
{Ap >G 
T{4} => G 


The other cases for subexponentials are similar or simpler. 


The next examples illustrate what we mean by acLLy being a “conservative exten- 
sion” of subsystems and variants. Indeed, although we remove structural properties of 
the core LL, subexponentials allow them to be added back, either locally or globally. 


Example 7 (Structural variants of iMALL). Adding combinations of contraction C and 
/ or weakening W for arbitrary formulas to additive-multiplicative intuitionistic linear 
logic (iMALL) yields, respectively, propositional intuitionistic logic ILP = iMALL + 
{C, W}, and the intuitionistic versions of affine linear logic aLL = iMALL + W and 
relevant logic R = iMALL + C. For the sake of presentation we overload the notation 
and use the connectives of linear logic also for these logics. In order to embed the 
logics above into acLL y, let œ € {ILP, aLL, R} and consider modalities !“ with f(a) = 
{E, A1, A2} U A where A C {C,W} is the set of axioms whose corresponding rules 
are in a. The translation Ta prefixes every subformula with the modality !“. For £ € 
{ILP, aLL, R} it is then straightforward to show that a structured sequent S is cut-free 
derivable in £ iff its translation 7,(.S) is cut-free derivable in the logic described by 
({a}, <, f) with < the obvious relation, and f as given above. 


Example 8 (Structural variants of FNL). Following the same script as above and start- 
ing from FNL: 


— considering f(a) = A C {E, Al, A2}; 
e If A = {A1, A2}, then we obtain the system FL; 
e If A= {E, A1, A2} then the resulting system corresponds to iMALL. 
e Adding C, W as options to A will result the affine/relevant versions of the sys- 
tems above. 
— inapre-order (J, <), if f(i) = {A1, A2 UA; where A; C {E,C, W} foreachi € J, 
then the resulting system corresponds to SMALC y in [21] (that is, the extension of 
FL with subexponentials). 


2.2 An Example in Linguistics 


Since its inception, Lambek calculus [29] has been applied to the modeling of natu- 
ral language syntax by means of categorial grammars. In a categorial grammar, each 
word is assigned one or several Lambek formulas, which serve as syntactic categories. 
For a simple example, John and Mary are assigned np (“noun phrase”) and loves gets 
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(np — s) — np. Here s stands for “sentence”, and loves is a transitive verb, which 
lacks noun phrases on both sides to become a sentence. Grammatical validity of “John 
loves Mary” is supported by derivability of the sequent np, (np > s) — np, np => s. 
Notice that this derivability keeps valid also in the non-associative setting, if the correct 
nested structure is provided: (np, ((np > s) — np,np)) => s. 

The original Lambek calculus L is associative. In some cases, however, associativity 
leads to over-generation, i.e., validation of grammatically incorrect sentences. Lambek 
himself realized this and proposed the non-associative calculus NL in [30]. We will 
illustrate this issue with the example given in [38, Sect. 4.2.2]. The syntactic category 
assignment is as follows (where n stands for “noun’’): 


Words Types 
the np — n 
Hulk n 
is (np > s) (nen) 
green, incredible n — n 


With this assignment, sentences “The Hulk is green” and “The Hulk is incredible” 
are correctly marked as valid, by deriving the sequent 


(np — n,n), ((np > s) (n= n), n= n) >s 


However, in the associative setting the sequent for the phrase “The Hulk is green 
incredible,” which is grammatically incorrect, also becomes derivable: 


np = n,n, (np > s) — (n= n),n = n,n =n > s, 


essentially due to derivability of n — n,n = n >n -n. 

In other situations, however, associativity is useful. Standard examples include han- 
dling of dependent clauses, e.g., “the girl whom John loves,” which is validated as a 
noun phrase by the following derivable sequent: 


np = n,n, (n = n) — (s — np), np, (np > s) — np = np 


Here (n — n) — (s — np) is the syntactic category for who. 

Our subexponential extension of NL, however, handles this case using local asso- 
ciativity instead of the global one. Namely, the category for whom now becomes 
(n > n) — (s — !%np), where !“ is a subexponential which allows the A2 rule, 
and the following sequent is happily derivable: 


np —n,(n,((n > n) — (s — !*np), (np, (np > s) — np))) = np 


The necessity of this more fine-grained control of associativity, instead of a global 
associativity rule, is seen via a combination of these examples. Namely, we talk about 
sentences like “The superhero whom Hawkeye killed was incredible” and “... was 
green”. With !“, each of them is handled in the same way as the previous examples: 


(np — n, (n, ((n > n) — (s — np), (np, (np > s) — np)))), 
((np > s)— (n= n),n—n) > 8. 


Non-associative, Non-commutative Multi-modal Linear Logic 459 


On one hand, without !“ this sequent cannot be derived in the non-associative sys- 
tem. On the other hand, if we make the system globally associative, it would validate 
incorrect sentences like “The superhero whom Hawkeye killed was green incredible.” 


3 Some Undecidability Results 


Non-associativity makes a significant difference in decidability and complexity matters. 
For example, while L is NP-complete [47], NL is decidable in polynomial time [1, 14]. 

For our system acLL», its decidability or undecidability depends on its signature 
X. In fact, we have a family of different systems acLL y, with X as a parameter. Recall 
that the subexponential signature X controls not just the number of subexponentials 
and the preorder among them. More importantly, it dictates, for each subexponential, 
which structural rules this subexponential licenses. If for every i € I we have C ¢ f(s), 
that is, no subexponential allows contraction, then acLL s is clearly decidable, since the 
cut-free proof search space is finite. Therefore, for undecidability it is necessary to have 
at least one subexponential which allows contraction. 

For a non-associative system with only one fully-powered exponential modality 
s (that is, f(s) = {E,C,W,A1,A2}), undecidability was proven in a preprint by 
Tanaka [51], based on Chvalovsky’s [11] result on undecidability of the finitary con- 
sequence relation in FNL. 

In this section, we prove two undecidability results. The first one is a refinement 
of Tanaka’s result: We establish undecidability with at least one subexponential which 
allows contraction and weakening (commutativity/associativity are optional), in a sub- 
system containing only the additive connective and the multiplicatives & and —. 

The second undecidability result is for the minimalistic, purely multiplicative frag- 
ment, which includes only — (not even ®). As a trade-off, however, it requires two 
subexponentials: the “main” one, which allows contraction, exchange, and associativity 
(weakening is optional), and an “auxiliary” one, which allows only associativity. 

It should be noted that this undecidability result is orthogonal to Tanaka’s [51], 
and the proof technique is essentially different. Indeed, Chvalovsky’s undecidability 
theorem does not hold for the non-associative Lambek calculus without additives, where 
the consequence relation is decidable [7]. 

Finally, we observe that if the intersection of these systems is decidable (which 
is still an open question), then our two undecidability results are incomparable: we 
have two undecidable fragments of acLLy, but their common part, which includes only 
divisions and one exponential, would be decidable. 


3.1 Undecidability with Additives and One Subexponential 


We are going to derive the next theorem from undecidability of the finitary consequence 
relation in FNL [11]. Recall that FNL is, in fact, the fragment of acLLs without subex- 
ponentials (that is, with an empty J). 


Theorem 9. Jf there exists such s € I that f(s) D {C,W}, then the derivability prob- 
lem in acLLy is undecidable. Moreover, this holds for the fragment with only ®, —, 
@,!°. 
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In fact, using C and W, one can also derive Al, A2, E1, and E2. Therefore, if 
f(s) D> {C,W}, then !* is actually a full-power exponential modality. (In the proof 
of Theorem 9 below, we use only W and C rules, in order to avoid confusion.) How- 
ever, Theorem 9 does not directly follow from undecidability of propositional linear 
logic [32], because here the basic system is non-associative and non-commutative, 
while linear logic is both associative and commutative. Thus, we need a different encod- 
ing for undecidability. 

Let ® be a finite set of FNL sequents. By FNL(®) let us denote FNL extended by 
adding sequents from ® as additional (non-logical) axioms. In general, FNL(®) does 
not enjoy cut-elimination, so mcut is kept as a rule of inference in FNL(®). A sequent 
I’ => F is called a consequence of ® if this sequent is derivable in FNL(®). 


Theorem 10 (Chvalovsky [11]). The consequence relation in FNL is undecidable, that 
is, there exists no algorithm which, given ® and I => F, determines whether I => F 
is a consequence of ®. Moreover, undecidability keeps valid when ® and I => F are 
built from variables using only ® and ©. 


Now, in order to prove Theorem 9, we internalize ® into the sequent using !*, 
assuming f(s) D {C, W}. 

First we notice that we may suppose, without loss of generality, that all sequents in 
® are of the form => A, that is, have empty antecedents. Namely, each sequent of the 
form IT = B can be replaced by > (® I) — B, where & IT is obtained from J by 
replacing each comma with ®. Indeed, these sequents are derivable from one another: 
from IT > B to > (@ I) — B we apply a sequence of @L followed by — R, and 
for the other direction we apply a series of cuts, first with (Q M, ( H) > B) > B, 
and then with (F, G) => F & G several times, for the corresponding subformulas of 
® I. The following embedding lemma (“modalized deduction theorem”) holds. 


Lemma 11. The sequent l => F is a consequence of ® = { => Aj,..., > An} if 
and only if the sequent ((...((!° Ai, !° A2), ° A3), ..-,!° An), T) = F is derivable in 
acLLy. 


Proof. Let us denote (. . . ((!° A1, !*Az2),!°A3),...,!°An) by !®. Notice that C and W 
can be applied to !® as a whole; this is easily proven by induction on n. 

For the “only if” direction let us take the derivation of l => F in FNL(®) (with 
cuts) and replace each sequent of the form A => G in it with (!8, A) > G, and each 
sequent of the form = G with !@ => G. The translations of non-logical axioms from ® 
are derived as follows: 

A, > A, init 


FASA der 
i> A, W, n — 1 times 


Translations of axioms init and 1R are derived from the corresponding original 
axioms by W, n times; T R remains valid. 


Non-associative, Non-commutative Multi-modal Linear Logic 461 


Rules QL, OL, OR;, & Li, &R, and 1L remain valid. For — L, — L, and mcut we 
contract !@ as a whole: 


(16,A)>F (18, r{G} >H (8,4) >F (18, {F}... {F} >C 
(6,{(6,A),F->@)sH~” ta rla, ay. ahaa Mt 
(0S, (A, F >@)}) > H (S, P{A}...{A}) > C 


For @R, — R, and — R, we combine contraction and weakening: 


(60)>F (9,2) >G 0S, (F,r) >G 
CAMASI CACADE 
(6, ((6,1),06,f))>FeaG™ AEE 
(S, (T, D) > FaG (S, r> F> 


Notice that our original derivation was in FNL(®), so it does not include rules 
operating subexponentials. 

For the “if” direction we take a cut-free proof of (!®, r) > F in acLLy and erase 
all formulas which include the subexponential. In the resulting derivation tree all rules 
and axioms, except those which operate !*, remain valid. Structural rules for !* trivialize 
(since the !-formula was erased). The !*R rule could not have been used, since we do 
not have positive occurrences of !*F’, and our proof is cut-free. 

Finally, der translates into 

I{}sG 
This is modeled by cut with one of the sequents from ®: 


I'{}seG 


mcut 


Thus, we get a correct derivation in FNL(®). 


Theorem 10 and Lemma 11 immediately yield Theorem 9. 


3.2 Undecidability Without Additives and with Two Subexponentials 


Theorem 12. Jf there are a,c € I such that f(a) = {A1,A2} and f(c) D 
{C, E, A1, A2}, then the derivability problem in acLLy is undecidable. Moreover, this 
holds for the fragment with only —, '*, and !°. 


Remember from Example 8 that SMALC», [21] denotes the extension of FL with subex- 
ponentials. The undecidability theorem above is proved by encoding the one-division 
fragment of SMALC» containing one exponential c such that f(c) D {C, E}. It turns 
out that that such a system is undecidable. 


Theorem 13 (Kanovich et al. [22,23]). If there exists such c € I that f(c) D {C, E}, 
then the derivability problem in SMALC ș is undecidable. Moreover, this holds for the 
fragment with only — and !°. 
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Observe that SMALC x can be obtained from acLL x by adding “global” associativ- 
ity rules: 
T{((41, A2), As} >G  T{(4, (42, As))} >G 
P{(Ai, (42, 43) >G  T{((41, Ae), As)} > G 


The usual formulation of SMALC 5, of course, uses sequences of formulas instead 
of nested structures as antecedents. The alternative formulation, however, would be 
more convenient for us now. It will be also convenient for us to regard all subexponen- 
tials in SMALC» to be associative, that is, f(s) D {A1, A2} for each s € I. 

In order to embed SMALC s into acLL», we define two translations, A!~ and A!*, 
by mutual recursion: 


=r ai 


(A B) =!*(At > BI) (A> Bt =A" => BT 

(B Ay" =!(B = At) (Be Ast =Bt EAD 

(A® s By =!*(A' ® B7) (A@B)'*=A'*@B* — where @ € {@,0, &} 
(A)T = (A) (VA) Teal (A) 


= where z is a variable,1, or T 


Informally, our translation adds a ! over any formula (not only over atoms) of 
negative polarity, unless this formula was already marked with a !°. Thus, all formulae 
in antecedents would begin with either the new subexponential ! or one of the old 
subexponentials !°, and all these subexponentials allow associativity rules A1 and A2. 


Lemma 14. A sequent Aj,..., An = B is derivable in SMALC>» if and only if its 
translation (... (AY , AY ),..., A”) => B'* is derivable in acLLy. 


Proof. For the “only if” part, let us first note that each formula ae is of the form !°F' 
and A1,A2 € f(s). Indeed, either s is an “old” subexponential label (for which we 
added A1, A2) or s = a. Thus brackets can be freely rearranged in the antecedent. 

Now we take a cut-free proof of A,,..., An = B in SMALC>» and replace each 
sequent in it with its translation. Right rules for connectives other than subexponentials, 
i.e., @R, OR;, &R, > R, and — R, remain valid as they are, up to rearranging brackets 
in antecedents. For !'R, we notice that the translation of a formula of the form "F, 
where j ~ i, is also a formula of the form ! F”. Thus, this rule also remains valid. 
The same holds for the dereliction rule der, because (!’F)'~ is exactly !(F'~). Finally, 
the “old” structural rules (exchange, contraction, weakening) also remain valid (up to 
rearranging of brackets), since UF gets translated into Pe). which enjoys the same 
structural rules. 

For the other left rules, we need to derelict !* first, and then perform the corre- 
sponding rule application. Rearrangement of brackets, if needed, is performed below 
dereliction or above the application of the rule in question. 

The “if” part is easier. Given a derivation of (... (AJ , AJ ),..., A) > B'+ in 
acLLs, we erase !“ everywhere, and consider it as a derivation in SMALC 5. Associa- 
tivity rules for the erased !“ (which are the only structural rules for this subexponential) 
keep valid, because now associativity is global. Dereliction and right introduction for 
!* trivialize. All other rules, which do not operate !“, remain as they are. Thus, we get 
a derivation of A;,..., An = Bin SMALCs, since erasing !* makes our translations 
just identical. 
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4 Related Work and Conclusion 


In this paper, we have presented acLLs,, a sequent-based system for non-associative, 
non-commutative linear logic with subexponentials. Starting form FNL, we modu- 
larly and uniformly added rules for exchange, associativity, weakening and contraction, 
which can be applied with the subexponentials having with the respective features. This 
allows for the application of structural rules locally, and it conservatively extends well 
known systems in the literature, continuing the path of controlling structural properties 
started by Girard himself [16]. 

Another approach to combining associative and non-associative behavior in 
Lambek-style grammars is the framework of the Lambek calculus with brackets by 
Morrill [39,40] and Moortgat [34]. The bracket approach is dual to ours: there the 
base system is associative, and brackets, which are controlled by bracket modalities, 
introduce local non-associativity. Both the associative Lambek calculus and the non- 
associative Lambek calculus can be embedded into the Lambek calculus with brackets: 
the former is just by design of the system and the latter was shown by Kurtonina [26] 
by constructing a translation. 

From the point of view of generative power, however, the (associative) Lambek 
calculus with brackets is weaker than the non-associative system with subexponentials, 
which is presented in this paper. Namely, as shown by Kanazawa [19], grammars based 
on the Lambek calculus with brackets can generate only context-free languages. In 
contrast, grammars based on our system with subexponentials go beyond context-free 
languages, even when no subexponential allows contraction (subexponentials allowing 
contraction may lead to undecidability, as shown in the last section). 

As a quick example, let us consider a subexponential !*° which allows both asso- 
ciativity (Al and A2) and exchange (E). If we put this subexponential over any 
(sub)formula, the system becomes associative and commutative. Using this system, one 
can describe the non context-free language MIX3, which contains all non-empty words 
over {a,b,c}, in which the numbers of a, b, and c are equal. Indeed, MIX; is the per- 
mutation closure of the language {(abc)” | n > 1}. The latter is regular, therefore 
context-free, and therefore definable by a Lambek grammar. The ability of our system 
to go beyond context-free languages is important from the point of view of applications, 
since there are known linguistic phenomena which are essentially non-context-free [49]. 

Regarding decidability, let us compare our results with the more well-known asso- 
ciative non-commutative and associative commutative cases. 

In the associative and commutative case the situation is as follows. In the pres- 
ence of additives, the system is known to be undecidable with one exponential modal- 
ity [32]. Without additives, we get MELL, the (un)decidability of which is a well-known 
open problem [50]. However, with two subexponentials MELL again becomes undecid- 
able [9]. Thus, we have the same trade-off as in our non-associative non-commutative 
case: for undecidability one needs either additives, or two subexponentials. 

Our results help to shed some light in the (un)decidability problem for the spectrum 
of logical systems surrounding MELL/FNL, allowing for a fine-grained analysis of the 
problem, specially the trade-offs on connectives and subexponentials for guaranteeing 
(un)decidability. 
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There is a lot to be done from now on. First of all, we would like to analyze better 
the minimalist fragment of acLLs; containing only implication and one fully-powered 
subexponential, as it seems to be crucial for understanding the lower bound of unde- 
cidability (or the upper bound of decidability). Second, one should definitely explore 
more the use of acLLs; in modeling natural language syntax. The examples in Sect. 2.2 
show how to locally combine sentences with different grammatical characteristics, and 
the MIX3 example above illustrates how that can be of importance. That is, it would 
be interesting to have a formal study about acLL sy and categorial grammars. Third, we 
plan to investigate the connections between our work and Adjoint logic [48] as well as 
with Display calculus [5,12]. Finally, we intend to study proof-theoretic properties of 
acLL», such as normalization of proofs (e.g. via focusing) and interpolation. 
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Abstract. A four-valued semantics for the modal logic K is introduced. 
Possible worlds are replaced by a hierarchy of four-valued valuations, 
where the valuations of the first level correspond to valuations that 
are legal w.r.t. a basic non-deterministic matrix, and each level further 
restricts its set of valuations. The semantics is proven to be effective, and 
to precisely capture derivations in a sequent calculus for K of a certain 
form. Similar results are then obtained for the modal logic KT, by simply 
deleting one of the truth values. 


1 Introduction 


Propositional modal logics extend classical logic with modalities, intuitively 
interpreted as necessity, knowledge, or temporal operators. Such extensions have 
several applications in computer science and artificial intelligence (see, e.g., 
(7,9, 13]). 

The most common and successful semantic framework for modal logics is the 
so called possible worlds semantics, in which each world is equipped with a two- 
valued valuation, and the semantic constraints regarding the modal operators 
consider the valuations in accessible worlds. While this has been the gold stan- 
dard for modal logic semantics for many years, alternative semantic frameworks 
have been proposed. One of these approaches, initiated by Kearns [10], is based 
on an infinite sequence of sets of valuations in a non-deterministic many-valued 
semantics. Since then, several non-deterministic many-valued semantics, with- 
out possible worlds, were developed for modal logics (see, e.g., [4,8,12,14]). The 
current paper is a part of that body of work. Having an alternative semantic 
framework for modal logics, different than the common possible worlds seman- 
tics, has the potential of exposing new intuitions and understandings of modal 
logics, and also to form the basis to new decision procedures. 

Our main contribution is a four-valued semantics for the modal logic K. The 
key characteristic of the semantics that we present is effectiveness: when checking 
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for the entailment of a formula y from a set I” of formulas in K, it suffices to 
only consider partial models, defined over the subformulas of I’ and y. To the 
best of our knowledge, this is the first effective Nmatrices-based semantics for 
K. Such a semantics has the potential of being subject to reductions to classical 
satisfiability [3], as it is based on finite-valued truth tables, and thus improving 
the performance of solvers for modal logic by utilizing off-the-shelf SAT solvers. 
Another advantage of this semantics is that it precisely captures derivations in a 
sequent calculus for K that admit a certain property. Following Kearns, models 
of this semantics are based on the concept of levels—valuations of level 0 are 
the ordinary valuations of Nmatrices, while each level m > 0 introduces more 
constraints. We show that valuations of level m correspond to derivations in 
the calculus whose largest number of applications of the rule that correspond to 
the axiom (K) in any branch of the derivation is at most m. Our restrictions 
between the levels are more complex than the original restrictions in Kearns’ 
work, in order to obtain effectiveness. Another precise correspondence between 
the semantics and the proof system that we prove, is between the domains of 
valuations and the formulas allowed to be used in derivations. 

Finally, we observe that by deleting one of the truth values, a three-valued 
semantics for the modal logic KT is obtained, which is similar to the one pre- 
sented in [8]. Like the case of K, the resulting semantics is effective, and tightly 
correspond to derivations in a sequent calculus for KT. 


Outline. The paper is organized as follows: Sect. 2 reviews standard notions in 
non-deterministic matrices. In Sect. 3, we present our semantics for the modal 
logic K, as well as the sequent calculus our investigation will be based on, which 
is coupled with the notion of (K)-depth of derivations. In Sect.4, we prove 
soundness and completeness theorems between the sequent calculus and the 
semantics. In Sect.5, we prove that the semantics that we provide is effective, 
not only for deciding entailment, but also for producing countermodels when an 
entailment does not hold. In Sect.6 we establish similar results for the modal 
logic KT. We conclude with §7, where directions for future research are outlined. 


Related Work. In [10], Kearns initiated the study of modal semantics without 
possible worlds. This work was recently revisited by Skurt and Omori [14], who 
generalized Kearns’ work and reframed his framework within the framework of 
logical Non-deterministic matrices. As indicated in [14], it was not clear how to 
make this semantics effective, as it requires checking truth values of infinitely 
many formulas when considering the validity of a given formula (see, e.g., Remark 
42 of [14]). In [4], Coniglio et al. develop a similar framework for modal logics, 
and some bound over the formulas that need to be considered was achieved. 
However, in [5], the authors clarified that it is unclear how to effectively use the 
resulting semantics. A semantics based on Nmatrices for the modal logics KT 
and S4 was presented in [8] by Gratz, that includes a method to extend a partial 
model in that semantics into a total one, which results in an effective semantics. 
We chose here to focus on K, which is a weaker logic, forming a common basis 
to all other normal modal logics. By deleting one out of four truth values, we 
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obtain corresponding results for KT as well. The semantics that we present here 
is similar in nature to the one presented in [8], however: (i) the truth tables 
are different, as we intentionally enforced the many-valued tables of the classical 
connectives to be obtained by a straightforward duplication of truth values from 
the original two-valued truth tables; and (ii) the semantic condition for levels of 
valuations that we define here is inductive, where each level relies on lower levels 
(thus refraining from a definition of a more cyclic nature as the one in [8], that is 
better understood operationally). A variant of the semantics from [14] was also 
introduced and studied in [12], but without considering the ability to perform 
effective automated reasoning but instead focusing on infinite valuations rather 
than on partial ones. A complete proof theoretic characterization in terms of 
sequent calculi to the various levels of valuations was not given in any of the 
above works. Also, an effective semantics for K, which is the most basic modal 
logic, was not given in any of the above works. 

Non-deterministic matrices were introduced in [2], and have since became 
a useful tool for investigating non-classical logics and proof systems (see [1 
for a survey). They generalize (deterministic) matrices [15] by allowing a non- 
deterministic choice of truth values in the truth tables. Like matrices, Nmatrices 
enjoy the semantic analyticity property, which allows one to extend a partial 
valuation into a full one. Our semantic framework can be viewed as a further 
refinement of non-deterministic matrices, namely restricted non-deterministic 
matrices, introduced in [6]. 


2 Preliminaries 


In this section we provide the necessary definitions about Nmatrices following [1]. 
We assume a propositional language £ with countably infinitely many atomic 
variables p;,p2,.... When there is no room for confusion, we identify £ with its 
set of well-formed formulas (e.g., when writing y € £). We write sub(y) for the 
set of subformulas of a formula y. This notation is extended to sets of formulas 
in the natural way. 


Valuations. In the context of a set V of “truth values”, a valuation is a function 
v from some domain Dom(v) C £ to V. For a set F C L, an F-valuation 
is a valuation with domain F. (In particular, an £-valuation is defined on all 
formulas.) For X C V, we write v~![X] for the set {yp | u(y) € X}. For z € V, 
we also write v—1[2] for the set {y | u(y) = x}. 


Definition 1. Let D C V be a set of “designated truth values”. A valuation v 
D-satisfies a formula y, denoted by v Ep y, if u(y) € D. For a set X of formulas, 
we write v =p X if v Ep ¢ for every y € X. 


Notation 2. Let D C V be a set of designated truth values and V be a set of 
valuations. For sets L, R of formulas, we write L +} R if for every v € V, v Ep L 
implies that v =p y for some y € R. We omit L or R in this notation when they 
are empty (e.g., when writing / R), and set parentheses for singletons (e.g., 
when writing L FY g). 
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Nmatrices. An Nmatrix M for £ is a triple of the form (V, D, O), where V is a 
set of truth values, D C V is a set of designated truth values, and O is a function 
assigning a truth table V” — P(V)\ {0} to every n-ary connective © of £ (which 
assigns a set of possible values to each tuple of values). In the context of an 
Nmatrix M = (V, D, O}, we often denote O(0) by ò. 

An F-valuation v is M-legal if u(y) € pos-val(y,M,v) for every formula 
p € F whose immediate subformulas are contained in F, where pos-val(y, M, v) 
is defined by: 


1. pos-val(p, M,v) = V for every atomic formula p. 
2. pos-val(o(w1,...,Un), M, v) = S(v(v1),.-.,0(Wn)) for every non-atomic for- 
mula o(¢1,..., Wn). 


In other words, there is no restriction regarding the values assigned to atomic 
formulas, whereas the values of compound formulas should respect the truth 
tables. 


Lemma 1 ([1]). Let F C L be a set closed under subformulas and M an 
Nmatrixz for L. Then every M -legal F-valuation v can be extended to an M -legal 
L£-valuation. 


3 The Modal Logic K 


In this section we introduce a novel effective semantics for the model logic K. 
We first present a known proof system for this logic (Sect. 3.1), and then our 
semantics (Sect. 3.2). From here on, we assume that the language £ consists 
of the connectives D, A, V, ~ and O with their usual arities. The standard © 


operator can be defined as a macro y = ay. Obviously, using De-Morgan 
rules, fewer connectives can be used. However, we chose this set of connectives in 
order to have a primitive language rich enough for the examples that we include 
along the paper. 


3.1 Proof System 


Figure 1 presents a Gentzen-style calculus, denoted by Gx, for the modal logic 
K that was proven to be equivalent to the original formulation of the logic as a 
Hilbert system (see, e.g., [16]). We take sequents to be pairs (I, A) of finite sets 
of formulas. For readability, we write = A instead of (I, A) and use standard 
notations such as I’, = w instead of (T U {p}) => {4}. 

The (CUT) rule is included in Gx for convenience, but applications of (CUT) 
can be eliminated from derivations (see, e.g., [11]). Since the focus of this paper 
is semantics rather than cut-elimination, we allow ourselves to use cut freely and 
do not distinguish derivations that use it from derivations that do not. We write 
Fe, T => A if there is a derivation of a sequent = A in the calculus Gy. 

In the sequel, we provide a semantic characterization of Fg. It is based on 
a more refined notion of derivability that takes into account: (i) the set F of 
formulas used in the derivation; and (ii) the (K)-depth of the derivation, as 
defined next. 
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ITyg=s>a 
roa r>9,4 c= 
(WEAK) ; ; (ID) (CUT) ul (K) Á 
DT >A,A Iie>y,Aa r> r => Ugy 
r => 9,4 
( r> 9,4 ( ) T, > A ( ) Iwasa ( ) T, p =>Ņ,A^A 
OYT aA T> -y, T,pD4,A T>ọ5y4,A 
r= 9,4 Ip => A 
Tp, y => A r=>y,A^ Ty => A r =g, y,A^A 
(A =) (>A) (v>) (> v) 
T p^y> A PS>pAv,A ITevysa Ta>oviu,A 


Fig. 1. The sequent calculus Gx 


Definition 3. A derivation of a sequent I = A in Gx is a tree in which the 
nodes are labeled with sequents, the root is labeled with = A, and every node 
is the result of an application of some rule of Gg where the premises are the labels 
of its children in the tree. A derivation is called an F-derivation if it employs 
only sequents composed of formulas from F. The (K)-depth of a derivation is 
the maximal number of applications of rule (K) in any of the branches of the 
derivation. 


Notation 4. We write Ha I’ = A if there is a derivation of l > A in Gg 
in which only F-sequents occur and that has (K)-depth at most m. We drop F 
from this notation when F = £; and drop m to dismiss the restriction regarding 
the (K)-depth. 
Example 1. Let y = O(pi A p2) D (Opı A Ope) and F = sub(y). The following 
is a derivation of > ọ in Gg that only uses F-formulas and has (K)-depth of 1 
(though the number of applications of (K) in the derivation is 2): 


——___— (ID) ——_—_—(ID) 
Pı, P2 > Pi (A =>) Pi, P2 > p2 (A =) 
Pı A P2 => pi K pı A po > po K 
(K) (K) 
(pi A p2) > Opi (pi A p2) > Ope (=> A) 
(pı A p2) > Opi A Ope (>>) 
=> O(pı A p2) D Opi A Ope 


3.2 Semantics 


The semantics is based on a four-valued Nmatrix stratified with “levels”, where 
for every m, legal valuations of level m + 1 are a subset of legal valuations of 
level m. The underlying Nmatrix, denoted by Mx, is obtained by duplicating 
the classical truth values. Thus, the sets of truth values and of designated truth 
values are given by: 


def 


V4 = {TE f, F} D = {T t} 
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The truth tables are as follows (we have D = {f, F}): 
yl) Te FF 


vy Ter t 


zÄ avy] TtFf alae z| Ör 
T|DDDD TIDDDD TIDDDD TID TID 
t |DDDD t IDDDD t | DDDD tD t D 
F|DDDD FIDDDD FiIDDDD FID FID 
f |DDDD f |DDDD f DDDD f|D f| D 


We employ the following notations for subsets of truth values: 
FST) "Sft 


For the classical connectives, the truth tables of Mx treat t just like T, and 
f just like F, and are essentially two-valued—the result is either D or D, and it 
depends solely on whether the inputs are elements of D or D. Thus, for the lan- 
guage without O, this Nmatrix provides a (non-economic) four-valued semantics 
for classical logic. 

While the output for O is also always D or D, it differentiates between T 
(that results in D) and t (that results in D), and similarly between F and f. In 
fact, this table is captured by the condition: L(x) € D iff x € TF. 


Example 2. Let F = sub(y) where ¢ is the formula from Example 1. The fol- 
lowing valuation v is an F-valuation that is Mx-legal: 


v(pı) = v(p2) =f v(pi Ape) =F v(Opi) = v(Op2) = v(Op; A Ope) = F 


v(O(p1 A p2)) =T v(O(pı A p2) D (Opi ^ Op2)) = F 


To show that it is Mg-legal, one needs to verify that u(w) € pos-val(w, Mx, v) 
for each w € F. For example, v(pi) = f € V4 = pos-val(p1, Mg, v). As another 
example, since v(p1) = f, we have that pos-val(Op1, Mx, v) = Õ(f) = {F,f}, and 
hence v(Op1) = F € pos-val(Kp1, Mx, v). Notice that v does not satisfy y. 


The truth table for O can be understood via “possible worlds” intuition. Our 
four truth values are intuitively captured as follows, assuming a given formula 
yw and a world w: 


— T: w holds in w and in every world accessible from w; 

— t: Y holds in w but it does not hold in some world accessible from w; 

— F: w does not hold in w but does hold in every world accessible from w; and 
— f: w does not hold in w and it does not hold in some world accessible from w. 


In the possible worlds semantics, Hw holds in some world w iff y holds in every 
world that is accessible from w, which intuitively explains the table for O. Note 
that non-determinism is inherent here. For example, if w holds in w and in every 
world accessible from w (i.e., has value T), we know that Ow holds in w, but 
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we do not know whether Oy holds in every world accessible from w (thus Ow 
has value T or t). 

Now, the Nmatrix Mx by itself is not adequate for the modal logic K (as 
Examples 1 and 2 demonstrate). What is missing is the relation between the 
choices we make to resolve non-determinism for different formulas. Continuing 
with the possible worlds intuition, we observe that if a formula vy follows from a 
set of formulas X that hold in all accessible worlds (i.e., p follows from formulas 
whose truth value is T or F), then ¢ itself should hold in all accessible worlds 
(i.e., y’s truth value should be T or F). Directly encoding this condition requires 
us to consider a set V of Mx-legal F-valuations for which the following holds 
(recall Notation 2 from Sect. 2): 


Vu € V. Yọ € F. (v~ [TF] FX y u(y) € TF) (necessitation) 


In turn, to obtain completeness we take a maximal set V that satisfies the 
necessitation condition. While it is possible to define this set of valuations as the 
greatest fixpoint of necessitation, following previous work, we find it convenient 
to reach this set using “levels”: 


Definition 5. The set Vz’ is inductively defined as follows: 


= ye is the set of Mx-legal F-valuations. 
Fym+1 def Fym =i Ve 
= Vz # fue vy | vp E€ F.v- [F] F> Y = v(y) € TF} 


We also define: 
VEE Ny” ve = Va" Ve = N Ve" 


m>0 m>0 


Similarly to the idea originated by Kearns in [10], valuations are partitioned 
into levels, which are inductively defined. The first level, ve. consists solely of 
the Mx-legal valuations with domain F. For each m > 0, the m’th level is defined 
as a subset of the (m — 1)’th level, with an additional constraint: a valuation v 
from level m — 1 remains in level m, only if every formula y € F entailed (at 
the m — 1 level) from the set of formulas that were assigned a value from TF 
by v, is itself assigned a value from TF by v. As we show below, in the “end” 
of this process, by taking m>o wom one obtains the greatest set V satisfying 
the necessitation condition 


Remark 1. The necessitation condition is similar to the one provided in [8] to the 
modal logics KT and S4. In contrast, the condition from [4,10,14] is simpler and 
does not involve v~ [TF] at all, but also does not give rise to decision procedures. 


Example 3. Following Example 2, while the formula y is not satisfied by all 


valuations in yee it is satisfied by all valuations in y” for every m > 0. In 
F0 
particular, the valuation v from Example 2 is not in ve: we have p1 Apo BA pı 


and v(pı A p2) = F (so pı A po € v4 [TF]), but v(p1) = f ¢ TF. 
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A ; yom 
For each set F C £L and m > 0, we obtain a consequence relation Fz 


between sets of F-formulas. Disregarding m, we also obtain the relation mis 
(for every F), which we will show to be sound and complete for K. We note that 
all these relations are compact. The proof of the following theorem relies on the 
completeness theorems that we prove in Sect. 4. 


Theorem 1 (Compactness). 


Fem Fym 
1. For every m > 0, if L oe R, then I B A for some finite I C L and 
ACR. 
vē v . 
2. If LF po R, then +> A for some finite r C L and AC R. 


Now, to show that VŽ is indeed the largest set V of Mx-legal F-valuations 
that satisfies necessitation, we use the following two lemmas. The first is a general 
construction that relies only on the use of finite-valued valuation functions. 


Lemma 2. Let vo, v1, V2,... be an infinite sequence of valuations over a common 
domain F. Then, there exists some v such that for every finite set F! C F of 
formulas and m > 0, we have v| r = vg|r for some k > m. 


Proof (Outline). First, if F is finite, then there is only a finite number of F- 
valuations, and there must exists some F-valuation v,, that occurs infinitely 
often in the sequence vo, v1, .... We take v = vm, and the required property triv- 
ially holds. Now, assume that F is infinite, and let yo, ~1,... be an enumeration 
of the formulas in F. For every i > 0, let F; = {yo,..., yi}. We construct a 
sequence of infinite sets Ap, A1,... C N such that: 


— For every i > 0, Aji C Aj. 
— For every 0 < j < i, a E€ Aj, and b E Aj, va(y;) = vol). 


To do so, take some infinite set Ap C N such that valpo) = vv(yo) for every 
a,b € Ap (such set must exist since we have a finite number of truth values). 
Then, given A;, we let Aj; be some infinite subset of A; such that vg(yj+1) = 
up(yi41) for every a,b € Aj41. The valuation v is defined by v(y;) = va(y;) for 
some a € A;. The properties of the A,;’s ensure that v is well defined, and it can 
be shown that it also satisfies the required property. 


Using Lemma 2 and the compactness property, we can show the following: 


Lemma 3. Let vo,v1,... be a sequence of valuations over a common domain F 
such that Um € wm for every m > 0. Then, there exists some v € ve such 
that for every y E€ F, v(y) = Um(y) for some m > 0. 


Proof (Outline). By Lemma 2, there exists some v such that for every finite set 
F' of formulas, v| f = Um| p for some m > 0. It is easy to verify that v satisfies 
the required properties. In particular, one shows that v € yo for every m > 0 


by induction on m. In that proof we use Theorem 1 to obtain a finite l C v—![1F 
yFim-l yF 


»m—1 
such that I Hz y from the assumption that v~'[TF] ;* y. Then, the 
above property of v is applied with F’ = IU {y}. 
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Now, our characterization theorem easily follows: 


Theorem 2. The set ve is the largest set V of Mx-legal F-valuations that sat- 
isfies necessitation. 


Proof (Outline). To prove that VŽ satisfies necessitation, one needs to prove 


Fm 
that if v-t [TF] LF y, then also v~![TF] H _w for some m > 0. This is done 
D D 


using Lemma 3. For maximality, given a set V, we assume by contradiction that 
there is some m such that V Z Vy’, take a minimal such m, and show that it 
cannot be 0. Then, from V C VZm — 1, it follows that actually V C V2", and 
thus we obtain a contradiction. 


Finite Domain. By definition we have V? > Vz"! > Vz? D... (and so, 
FO Fl F 2 

BA Cc BA Cc BA C ...). Next, we show that when F is finite, then this 

sequence must converge. 

Lemma 4. Suppose that V2°™ = V2Z'™"*" for some m > 0. Then, VF = V”. 


Lemma 5. For a finite set F of formulas, VF = yee 


Proof. The left-to-right inclusion follows -irom our definitions. For n right- to- 
left inclusion, note that by Lemma 4, y? =y m+ implies that Ve = VP 

for every k > m. Thus, it suffices to show that VZ’" = Vz’"*' for some 0 < 
m < 4/7141. Indeed, otherwise we have VŽ? > VŽ D VF? D...D yee 
but this is impossible since there are only 4!*! functions from F to V4. 


Optimized Tables. Starting from level 1, the condition on valuations allows us 
to refine the truth tables of Mg, and reduce the search space for countermodels. 
For instance, since Y% Kr p D y (for every F with {4, 9, D Y} C F), at level 
1 we have that if y € v~'[TF], then v(p D y) € TF. This allows us to remove 
t and f from the first and third columns (when y € TF) in the table presenting 
5. The following entailments (at level 0), all with a single occurrence of some 
connective, lead to similar refinements, resulting in the optimized tables below 
for D, A and V: 


vee yee wre wi? 
p,p DYFp Yp p, Y Fp pry prayers p p hyp p 
wre ye 8 
pro pyy yey yvy 


cõy T t Ff chy] T t Ff aVy| T t F f 


T {D {t {F} {f} T | {T} {t} {F} {f TIG {D {D {1} 


t {T} D {F} D t | {th {t} {P tf} t |{T} D {T} D 
F {1T} {t {1} {th F | {F} {f} {F} {f} F | {T} {T} {F} {F} 
f |{T} D {T} D Pde ae to iA f }{T} D {F} D 
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We note that level 1 valuations are not fully captured by these tables. For 
example, they must assign T to every formula of the form y D y, while the 
table above allows also t when u(y) € tf. A decision procedure for K can benefit 
from relying on these optimized tables instead of the original ones, starting from 
level 1. 


4 Soundness and Completeness 


In this section we establish the soundness and completeness of the proposed 
semantics. For that matter, we first extend the notion of satisfaction to sequents: 


Definition 6. An F-valuation v D-satisfies an F-sequent I’ = A, denoted by 
v Ep T =A, if v Fo y for some y E T or v Ep ¢ for some y €E A. 


To prove soundness, we first note that except for (K), the soundness of each 
derivation rule easily follows from the Nmatrix semantics: 


Lemma 6 (Local Soundness). Consider an application of a rule of Gx other 
than (K) deriving a sequent I = A from sequents I, => Aj,..., In > An, such 
that TUT U... UT UAUA,U...U A, CF. Let v € VZ” for some m > 0. 
If v Ep T; > A; for every 1< i< n, then v |p F > A. 


For (K), we make use of the level requirement, and prove the following 
lemma. 


Lemma 7 eee of F (K)). Suppose that r UOT U {9,09} C F, and 
I Hy . Then, rey yp. 


Proof. Let v € V2’™ such that v Ep OF. We prove that v Ep Oy. By the 
truth table of O, we have that v(w) € TF for every y € I, and we need to 
show that v(y) € TF. Since v(w) € TF for every w € I’, we have I C v—1[TF]. 


yrim-l Weems? 


Since TH% y, we have v~![TF] Fy y. Since v € Ve", it follows that 
u(y) € T 


The above two lemmas together establish soundness, and from soundness for 
each level, we easily derive soundness for arbitrary (K)-depth. 


F,m 
Theorem 3 (Soundness for m). If He T > A, then T Lye A. 


“a 
Theorem 4 (Soundness without m). If HE, [=> A, then T by A. 
By taking F = £ in Theorem 4 we get that if Fe, [=> A, then I es A. 
Next, we prove the following two completeness theorems: 


Theorem 5 (Completeness for m). Let F C L closed under subformulas 
Fym 
and I = A an F-sequent. If T Bi A, then fe rsa. 
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Theorem 6 (Completeness without m). Let F C L closed under subfor- 
F 
mulas and IT = A an F-sequent. If T Ly A, then +e TSA. 


In fact, since F may be infinite, we need to prove stronger theorems than 
Theorems 5 and 6, that incorporate infinite sequents. 


Definition 7. An w-sequent is a pair (L, R}, denoted by L = R, such that L and 
R are (possibly infinite) sets of formulas. We write lea L= Rif ous r= 
for some finite l C Land AC R. 


Other notions for sequents (e.g., being an F-sequent) are extended to w- 
sequents in the obvious way. In particular, v Ep L => R if v(w) ¢ D for some 
w€ Lor v(w) € D for some w € R. 


Theorem 7 (w-Completeness for m). Let FC L closed under subformulas 
Fym 
and L => R an w-F-sequent. If L bye R, then he L=>R. 


Theorem 8 (w-Completeness without m). Let F C L closed under subfor- 
F 
mulas and L = R an w-F-sequent. If L bye R, then Ire L=>R. 


Theorem 5 is a consequence of Theorem 7. Indeed, by Theorem 7, I" Ke A 
implies that fe I’ = A’ for some (finite) I” C I and A’ C A. Using (WEAK), 
we obtain that hon I = A. Similarly, Theorem 6 is a consequence of Theorem 
8. Also, using Lemma 3, we obtain Theorem 8 from Theorem 7. Hence in the 
remainder of this section we focus on the proof of Theorem 7. 


Proof of Theorem 7. We start by defining maximal and consistent w-sequents, 
and proving their existence. 


Definition 8 (Maximal and consistent w-sequent). Let F C £ and m > 0. 
An F-w-sequent L > R is called: 


1. F-mazimal if FC LUR. 

2. (Gx, F,m)-consistent if fo" L= R 

3. (Gx, F,m)-maximal-consistent (in short, (Gx,F,m)-maz-con) if it is F- 
maximal and (Gx, F, m)-consistent. 


Lemma 8. Let F C L and L => R an F-w-sequent. Suppose that va L=>R. 
Then, there exist sets Lyc(cy,Fym,Lb>R) and RMC(Gr,F,m,L=>R) such that the 
following hold: 


- LC Luco(e,Fym,~>r) and RC RMC(Gr,F,m,L>R)- 
~ LMCO(Gr,F,m,L>R) U RMC(Gr,F,m,L>R) © F. 
— LMC(Gr,F,m, LR) > RMC(Gr,F,m,L>R) 18 (Gg, F,m)-maz-con. 


Thus, given an underivable w-sequent, we can extend it to a (Gg, F, m)-max- 
con w-sequent. This w-sequent induces the canonical countermodel, as defined 
next. 
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Algorithm 1. Deciding I" Ee p. 
1: F — sub(r U {9}) 

2: m — 4/7 

3: for v € Vz" do 
4: if v Ep T and v Ko » then 
5: 
6 


return (“NO”, v) 
: return “YES” 


Notation 9. We denote the set {4 € F | Oy € X} by BŽ. 


Definition 10. Suppose that LW R = F. The canonical model w.r.t. L = R, 
F, and m, denoted by v(F, L > R, m), is the F-valuation defined as follows (in 
À notation): 


For m = 0: For m > 0: 
T pE€LandūOyeEL T yeéLand HE”! BE >y 
€ Land L € Land Ko" *BE=> 
Ap € F. £ wees Ap € F. = as a a á 
p € Randbye L F pe Rand Ire Be => Y 
f ye RandOv¢éL f péRand YET BE > p 


Clearly, v(F, L > R,m) p L = R. The proof of Theorem 7 is done by 
induction on m, and then carries on by showing that if L => R is (Gg, F, m)- 
max-con, then v(F, L > R, m) belongs to Vz” for every m. 

Concretely, let v = v(F, L > R,m). We show that v € VZ"* for every k < m 
by induction on k. The base case k = 0 is straightforward. For k > 0, we 
have v € y? A= by the induction hypothesis. Let y € F, and suppose that 

F,k—1 
v- t[TF] Lye p. To show that v(y) € TF, we prove that fo 1 BE > go. 
By the iter induction hypothesis (rogaine the iee theorem atli. 

TE] Lye K y implies that a 1 vy ![1F] = y, which implies that ae È 
v—'[IF] = y. Hence, there is a finite set {¢p1,..., Yn} C v *[TF] such that 
Pe {y1,...;n} => y. For every 1 < i < n, since y; € v—"[TF], we have 
that re l Ł => y; and hence re T; => yi for some T; C BE. Using 
n applications of (CUT) on these sequents and Been {v1,---, Yn} > p, we 
obtain that hee Di,..., Tn > 9, and so fo “BE >y. 


5 Effectiveness of the Semantics 


In this section we study the effectiveness of the semantics introduced in Defini- 
tion 5 for deciding Fy, . Roughly speaking, a semantic framework is said to be 
effective if it induces a decision procedure that decides its underlying logic. 

Consider Algorithm 1. Given a finite set I of formulas and a formula y, it 
checks whether any valuations in Vj’ is a countermodel. The correctness of 
this algorithm relies on the analyticity of Gx, namely: 
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Lemma 9 ((11]). If He, T > A, then FRPEMHED ps A. 
Using Lemma 9, we show that the algorithm is correct. 


Lemma 10. Algorithm 1 always terminates, and returns “YES” iff T LYE y. 


Proof. Termination follows from the fact that yom is finite. Suppose that the 
result is “YES” and assume for contradiction that I ye p. Hence, there exists 


some u € Vx such that u Ep T and u Æp vy. Consider v = ul¢. Then, v € 
vV? yv”, which contradicts the fact that the algorithm returns “YES”. Now, 
suppose that the result is “NO”. Then, there exists some v € ve such that 


F 
v Ep T and v Kp ọ. By Lemma 5, v € VE. Hence, IT V p. By Theorem 3, 
we have VE I’ => ọ. By Lemma 9, we have Ve, IT => y. By Theorem 6, we have 
N. 
I Yp ¢. 
Lemma 10 shows that Algorithm 1 is a decision procedure for Fm, , when 
ignoring the additional output provided in Line 5. However, it is typical in appli- 
cations that a “YES” or “NO” answer is not enough, and often it is expected 
that a “NO” result is accompanied with a countermodel. Algorithm 1 returns a 


valuation v in case the answer is “NO”, but Lemma 10 does not ensure that v 
is indeed a countermodel for T’ es y. The issue is that the valuation v from the 


proof of Lemma 10 witnesses the fact that ee only in a non-constructive way. 
Indeed, using the soundness and completeness theorems, we are able to deduce 
that v’ Ep T and v' Kp ¢ for some v’ € Vx, but the relation between v and v’ is 
unclear. Most importantly, it is not clear whether v and v’ agree on F-formulas. 
In the remainder of this section we prove that v’ extends v, and so the returned 
countermodel of Line 5 can be trusted. 


We say that a valuation vu’ extends a valuation v if Dom(v) C Dom(v’) and 
u'(y) = v(y) for every y € Dom(v) (identifying functions with sets of pairs, 
this means v C v’). Clearly, for a Dom(v)-formula 7 we have that v’ Ep w iff 
v Ep Y. We first show how to extend a given valuation v € vo by a single 
formula ~ such that sub(W)\ {Y} C F, obtaining a valuation v’ € qe that 


agrees with v on all formulas in £F. 


Lemma 11. Let m > 0, F C L, and v € VE”. Lety € L\F such that 
sub(1b)\ {Y} C F. Then, v can be extended to some v! € VZV ™ 


We sketch the proof of Lemma 11. 
When m = 0, v’ exists from Lemma 1. For m > 0, we define v’ as follows:! 


FU{p},m-1 


vhp) per 
v dp € FU {y4}. min(pos-val(w,Mx,v) N TF) g=wAv‘[IF] Lye 
min(pos-val(q, Mg, v) N tf) otherwise 


1 The use of min here assumes an arbitrary order on truth values. It is used here only 
to choose some element from a non-empty set of truth values. 
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The proof of Lemma 11 then carries on by showing that v’ € yrlivhm, 


Next, Lemma 11 is used in order to extend partial valuations into total ones. 


Lemma 12. Letv € yn for some F closed under subformulas. Then, v can 
be extended to some v' € VR. 


Finally, Lemmas 3 and 12 can be used in order to extend any partial valuation 
in VŽ into a total one. 


Lemma 13. Letv € VĒ for some set F closed under subformulas. Then, v can 
be extended to some v’ € Vx. 


We conclude by showing that when Algorithm 1 returns (“NO”, v), then v 
is a finite representation of a true countermodel for "Fy, y. 


Corollary 1. If T /}* p. Then Algorithm 1 returns (“NO”, v) for some v for 
which there exists v! € Vx such that v = u'|sub(ru{y}), v En T, and v op g. 


Proof. Suppose that I" 24 y. Then by Lemma 10, Algorithm 1 does not return 
“YES”. Therefore, it returns (“NO”, v) for some v € VZ’” such that v Ep T 
and v Ep y, where F = sub(T U {y}) and m = 4!7!. By Lemma 5, v € VgF. 
By Lemma 13, v can be extended to some v’ € Vg. Therefore, v = U' | sub(PU{p})> 

/ 


v Ep I, and wv’ Ep ¢. 


Remark 2. Notice that in scenarios where model generation is not important, 
m can be set to a much smaller number in Line 2 of Algorithm 1, namely, the 
“modal depth” of the input.? The reason for that is that for such m, it can be 
shown that ee I= ọ iff re I’ => y, by reasoning about the applications of 
Fm 
rule (K). Using the soundness and completeness theorems, we can get I" BA p 
F 
iff I R4 y, and so limiting to such m is enough. Notice however, that we do not 


necessarily get vn = VÝ for such m, and so the valuation returned in Line 5 
might not be an element of VŽ. 


6 The Modal Logic KT 


In this section we obtain similar results for the modal logic KT. First, the calculus 
Grr is obtained from Gx by adding the following rule (see, e.g., [16]): 

T,= A 
rups A 


Derivations are defined as before. (In particular, the (K)-depth of a derivation 
still depends on applications of rule (K), not of rule (T).) We write re [=A 


? The modal depth of an atomic formula p is 0. The modal depth of Dy is the modal 
depth of p plus 1. The modal depth of o(¢1,...,¢n) for o # O is the maximum 
among the modal depths of ¢1,..., Pn- 
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if there is a derivation of l = A in Gpr in which only F-sequents occur and that 
has (K)-depth at most m. 

Next, we consider the semantics. For a valuation v € Vx to respect rule (T), 
we must have that if v Ep T, => A, then v Ep 17,09 = A. In particular, 
when v p I’ => A, we get that if u(y) ¢ D, then v(Oy) ¢ D. Now, if v(y) =F, 
then v(Oy) € D according to the truth table of O in Mg. But, we must have 
v(Oy) ¢ D. This leads us to remove F from Mx. 


We thus obtain the following Nmatrix Mgr: The sets of truth values and of 
designated truth values are given by® 


Va = {T,t, f} D = {T,t} 
and the truth tables are as follows: 
xrõy]| Te f xzÄy| Tf aVy| Tef z| zr z| Fr 
T IDD{f} TID D {f} T|DDD T| {f} T| D 
t | DD {f} t | D D {f} t |DDD t| {f} t| {f} 
f |DDD f IRR f DDIR f D f| {f} 


Again, one may gain intuition from the possible worlds semantics. There, 
the logic KT is characterized by frames with reflexive accessibility relation. Thus, 
for instance, if w holds in w but not in some world accessible from w (i.e., Y 
has value t), we know that O% does not hold in w, and the reflexivity of the 
accessibility relation implies that Oy does not hold in some world accessible 
from w (thus Ow has value f). 


def 


Example 4. Let y = (pı A p2) D Op, and F = sub(y). The sequent > y has 
a derivation in Gyr using only F formulas of (K)-depth of 1. However, it is not 
satisfied by all Mgr-legal F-valuations. For example, the following valuation is 
an Mxr-legal valuation that does not satisfy y: 


vf 


v(pı) = v(p2) =t pi) =f 


u(y) =f 


Next, we define the levels of valuations for Mgr. These are obtained from 
Definition 5 by removing the value F: 


v(pı A p2) = v(O(pi A p2)) = v(ODO(p1 A p2)) = T 


Definition 11. The set Vz," is recursively defined as follows: 
— yir is the set of Mxr-legal F-valuations. 

Fym 
a S fo EV” | Vee F.v IT] FR p => vle) = T} 
We also define: 


m def y7L,m 
Ver = Vat 


F det Fam 
Vir = N Vir 


m>0 


def Lim 
Ver = () Vit 


m>0 


3 In this section we denote the set {T} by TF. 
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Example 5. Following Example 4, we note that for every v € Viz" with m > 0, 
we have v Ep y. In particular, the valuation v from Example 4 does not belong 


Fym i ve 
to Vir : O(p1 A p2) E€ v *[T], O(p1 A p2) Fp) pi, but v(pi) =t. 


Similarly to Theorem 2, the levels of valuations converge to a maximal set 
that satisfies the following condition: 


W € V. Yọ € F.v [TIES ọ u(y) =T (necessitationgr) 


Theorem 9. The set V, is the largest set V of Myr-legal F-valuations that 
satisfies necessitation gr. 


The proof of Theorem 9 is analogous to that of Theorem 2. 


Remark 3. The necessitationgr condition is equivalent to the one given in [8], 
except that the underlying truth table is different. Theorem 9 proves that our 
gradual way of defining Vz, via levels coincides with the semantic condition 
from [8]. 


As we demonstrated for K, starting from level 1, the condition on valuations 
allows us to refine the truth tables of Mgr, and reduce the search space. Simple 
entailments (at level 0) lead to the optimized tables below for D, A and V: 


ey T t f zy T t f aVy| T t f 


T {D {t {f T | {T} {t} {f} T {G { T) 
t {T} D {f} t | {t} {t} {f} t {D D D 
f |{T} D D f | tf} {P {Ff f {T} D tf} 


Soundness and completeness for Ggr are obtained analogously to Gg, keeping 
in mind that Mgr is obtained from Mx by deleting the value F. For soundness, 
this is captured by the rule (T). For completeness, the same construction of a 
countermodel is performed , while rule (T) ensures that it is three-valued. 


Theorem 10 (Soundness and Completeness). Let F C L closed under 
subformulas and T = A an F-sequent. 


var . Fim 
1. For everym > 0, TF Aif Fg T= A. 
2. DEM A if HE Ps A. 


Effectiveness is also shown similarly to K. For that matter, we use the follow- 
ing main lemma, whose proof is similar to Lemma 13. The only component that 
is added to that proof is making sure that the constructed model is three-valued. 


Lemma 14. Letv € Vf, for some set F closed under subformulas. Then, v can 
be extended to some v' € Vr. 


Let Algorithm 2 be obtained from Algorithm 1 by setting m to 3!7! in Line 
2, and taking v € VŽ” in Line 3. Similarly to Lemma 10 and Corollary 1, we 


get that Algorithm 2 is a model-producing decision procedure for Fm, - 
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Lemma 15. Algorithm 2 always terminates, and returns “YES” iff T R4 yp. 
Further, if I Ms yp, then it returns (“NO”, v) for some v for which there exists 
v € Vgr such that v = 0'|sue(ruf{y}), V Ep T, and v' op ¢. 


7 Future Work 


We have introduced a new semantics for the modal logic K, based on levels of 
valuations in many-valued non-deterministic matrices. Our semantics is effective, 
and was shown to tightly correspond to derivations in a sequent calculus for K. 
We also adapted these results for the modal logic KT. 

There are two main directions for future work. The first is to establish sim- 
ilar semantics for other normal modal logics, such as KD, K4, S4 and S5, and to 
investigate © as an independent modality. The second is to analyze the complex- 
ity, implement and experiment with decision procedures for K and KT based on 
the proposed semantics. In particular, we plan to consider SAT-based decision 
procedures that would encode this semantics in SAT, directly or iteratively. 
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Abstract. The modal logic K is commonly used to represent and reason 
about necessity and possibility and its extensions with combinations of 
additional axioms are used to represent knowledge, belief, desires and 
intentions. Here we present local reductions of all propositional modal 
logics in the so-called modal cube, that is, extensions of K with arbitrary 
combinations of the axioms B, D, T, 4 and 5 to a normal form comprising 
a formula and the set of modal levels it occurs at. Using these reductions 
we can carry out reasoning for all these logics with the theorem prover 
KgP. We define benchmarks for these logics and experiment with the 
reduction approach as compared to an existing resolution calculus with 
specialised inference rules for the various logics. 


1 Introduction 


Modal logics have been used to represent and reason about mental attitudes such 
as knowledge, belief, desire and intention, see for example [17,20,31]. These can 
be represented using extensions of the basic modal logic K with one or more 
of the axioms B (symmetry), D (seriality), T (reflexivity), 4 (transitivity) and 
5 (Euclideaness). The logic K and these extensions form the so-called modal cube, 
see Fig. 1. In the diagram, a line from a logic Lı to a logic Lə to its right and/or 
above means that all theorems of L; are also theorems of Ls, but not vice versa. 
As indicated in Fig. 1, some of the logics have the same theorems, e.g., KB5 and 
KB4. Also, all logics not explicitly listed have the same theorems as KT5 aka S5. 
In total there are 15 distinct logics. 

While these modal logics are well-studied and a multitude of calculi and 
translations to other logics exist, see, e.g., [1,3-6,9,13,14,16,18,22,41], fully 
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S4 = KT4 S5 = KT5 = KBD4 = KBT4 = --- 
KT ; KTB .” 
KD4 es: woes KD45 
2. KDB 
KD ; — KDB 
f K4 ve a males K45 bof KBA = KB5 = KB45 
K chet KB 


Fig. 1. Modal Cube: Relationships between modal logics 


automatic support by provers is still lacking. Early implementations covering 
the full modal cube, such as Catach’s TABLEAUX system [7], are no longer 
available. LoTREC 2.0 [10] supports a wide range of logics but is not intended 
as an automatic theorem prover. MOIN [11] supports all the logics but the focus 
is on producing human-readable proofs and countermodels for small formulae. 
Other provers that go beyond just K, like MleanCoP [28] and CEGARBox [15] 
only support a small subset of the 15 logics. There are also a range of transla- 
tions from modal logics to first-order and higher-order logics [13,18, 19, 27,33]. 
Regarding implementations of those, SPASS [33,43] is limited to a subset of the 
15 logics, while LEO-III [13,36] supports all the logics in the modal cube, but 
can only solve very few of the available benchmark formulae. 

KgP [23] is a modal logic theorem prover that implements both the modal- 
layered resolution (MLR) calculus [25] for the modal logic K and the global 
resolution (GMR) calculus [24] for all the 15 logics considered here. It also sup- 
ports several refinements of resolution and a range of simplification rules. In this 
paper, we give reductions of all logics of the modal cube into a normal form for 
the basic modal logic K. We then compare the performance of the combination of 
these reductions with the modal-layered resolution calculus to that of the global 
resolution calculus on a new benchmark collection for the modal cube. 

In [29] we have presented new reductions! of the propositional modal logics 
KB, KD, KT, K4, and K5 to Separated Normal Form with Sets of Modal Levels 
SNF mı: SNF m; is a generalisation of the Separated Normal Form with Modal 
Level, SNF,,,;. In the latter, labelled modal clauses are used where a natural 
number label refers to a particular level within a tree Kripke structure at which a 
modal clause holds. In the former, a finite or infinite set of natural numbers labels 
each modal clause with the intended meaning that such a modal clause is true 
at every level of a tree Kripke structure contained in that set. As our prover KsP 
and the modal-layered resolution calculus it implements currently only support 
sets of modal clauses in SNF,,,;, we then use a further reduction from SNF 


ml? sml 


1 A reduction here is a satisfiability preserving mapping between logics. 
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to SNF,,, to obtain an automatic theorem prover for these modal logics. Where 
all modal clauses are labelled with finite sets, this reduction is straightforward. 
This is the case for KB, KD and KT. For K4 and K5, characterised by the axioms 
po p and Oy — Oy, modal clauses are in general labelled with infinite 
sets. However, using a result by Massacci [21] for K4 and an analogous result 
for K5 by ourselves, we are able to bound the maximal level occurring in those 
labelling sets which in turn makes a reduction to SNF,,,; possible. 

Also in [29], we have shown experimentally that these reductions allow us 
to reason effectively in these logics, compared to the global modal resolution 
calculus [24] and to the relational and semi-functional translation built into the 
first-order theorem prover SPASS 3.9 [33,38,42]. The reason that the comparison 
only included a rather limited selection of provers is that these are the only ones 
with built-in support for all six logics our reductions covered. 

Unfortunately, we cannot simply combine our reductions for single axioms to 
obtain satisfiability preserving reductions for their combinations. There are two 
main reasons for this. First, our calculus does not use an explicit representation 
of the accessibility relationship within a Kripke structure, which would make it 
possible to reflect modal axioms via corresponding properties of that accessibil- 
ity relationship. Instead, we add labelled modal clauses based on instances of the 
modal axioms for O-formulae occurring in the modal formula we want to check 
for satisfiability. However, if we deal with multiple modal axioms, then these 
axioms might interact making it necessary to add instances that are not nec- 
essary for each individual axiom. For instance, consider, the converse of axiom 
B, OOy — ọ, and axiom 4, Oy — p. Together they imply OOy — Oy. 
Instances of this derived axiom are necessary for completeness of a reduction 
from KB4 to K, but are unsound for KB and K4 separately. 

Second, our reductions attempt to keep the labelling sets minimal in size in 
order to decrease the number of inferences that can be performed. Again, taking 
axioms B and 4 as examples, in KB, a O-formula Oy true at level ml in a tree- 
like Kripke structure M forces w to be true at level ml — 1, while in K4, Ow 
true at level ml in M forces ~ to be true all levels ml’ with ml’ > ml. This is 
reflected in the labelling sets we use for these two logics. However, for KB4, Ow 
true at level ml forces y% to be true at every level in a tree-like Kripke structure 
M (unless M consists only of a single world). 

Since we intend to maintain these two properties of our reductions, we have to 
consider each modal logic individually. As we will see, for some logics a reduction 
can be obtained as the union of the existing reductions while for others we need 
a logic-specific reduction to accommodate the interaction of axioms. 

The structure of the paper is as follows. In Sect.2 we recall common con- 
cepts of propositional modal logic and the definition of our normal form SNF,,,. 
Section 3 introduces our reduction for extensions of the basic modal logic K with 
combinations of the axioms B, D, T, 4, and 5. Section 4 presents a transforma- 
tion from SNF,,,,, to SNF; which allows us to use the modal resolution prover 
KgP to reason in all the modal logics. In Sect.5 we compare the performance 
of a combination of our reductions and the modal-layered resolution calculus 
implemented in the prover KsP with resolution calculi specifically designed for 
the logics under consideration as well as the prover LEO-III. 
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2 Preliminaries 


The language of modal logic is an extension of the language of propositional 
logic with a unary modal operator O and its dual ©. More precisely, given a 
denumerable set of propositional symbols, P = {p,po,q,4q0,t,to,...} as well as 
propositional constants true and false, modal formulae are inductively defined 
as follows: constants and propositional symbols are modal formulae. If y and w 
are modal formulae, then so are ~y, (pA Y), (pV v), (Y > y), Dy, and Oy. 
We also assume that A and V are associative and commutative operators and 
consider, e.g., (pV(qVr)) and (rV(qVp)) to be identical formulae. We often omit 
parentheses if this does not cause confusion. By var(wy) we denote the set of all 
propositional symbols occurring in y. This function straightforwardly extends 
to finite sets of modal formulae. A modal axiom (schema) is a modal formula w 
representing the set of all instances of y. 

A literal is either a propositional symbol or its negation; the set of literals is 
denoted by Lp. By ~l we denote the complement of the literal l € Lp, that is, if 
lis the propositional symbol p then ~l denotes ~p, and if l is the literal =p then 
=l denotes p. By |l] for l € Lp we denote p if l = p or l = ~p. A modal literal is 
either Ol or Ol, where l € Lp. 

A (normal) modal logic is a set of modal formulae which includes all propo- 
sitional tautologies, the axiom schema O(y — W) (Op W), called the 
axiom K, it is closed under modus ponens (if F y and + y > w then F w) and 
the rule of necessitation (if H y then F Oy). 

K is the weakest modal logic, that is, the logic given by the smallest set of 
modal formulae constituting a normal modal logic. By KX’ we denote an extension 
of K by a set X of axioms. 

The standard semantics of modal logics is the Kripke semantics or possible 
world semantics. A Kripke frame F is an ordered pair (W, R) where W is a non- 
empty set of worlds and R is a binary (accessibility) relation over W. A Kripke 
structure M over P is an ordered pair (F,V) where F is a Kripke frame and the 
valuation V is a function mapping each propositional symbol in P to a subset 
V(p) of W. A rooted Kripke structure is an ordered pair (M, wo) with wo € W. To 
simplify notation, in the following we write (W, R, V} and (W, R, V, wo) instead 
of (W, R), V) and (((W, R}, V}, wo), respectively. 

Satisfaction (or truth) of a formula at a world w of a Kripke structure M = 
(W, R, V} is inductively defined by: 


H true; (M,w) j false; 

H iff w € V(p), where p € P; 

= 7p iff (M, w) E 9; 

iff (M, w) = y and (M, w) H Y; 

= (pvp) iff (M, w) F p or (M, w) Fy; 

= (y > Y) iff (M, w) E =ẹ or (M, w) = Y; 

= Oy iff for every v, w Rv implies (M, v) = y; 
E Op iff there is v,w Rv and (M,v) H y. 
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Table 1. Modal axioms and relational frame properties 


Name | Axiom Frame Property 

D yoy | Serial Vudu.v Rw 

T yoy Reflexive | Vw.w Rw 

B yo Oey |Symmetric | Yvw.v Rw —> w Rv 

4 p > OOy | Transitive | VYuvw.(u Rv Av Rw) >u Rw 
5 Oy —> O%¢ | Euclidean | Yuvw.(u Rv ^u Rw) —>v Rw 


Table 2. Rewriting Rules for Simplification 


pvAp=s>e ^o = false true > true true > false ~~o > p 
ypVep>yp vpV-7y=> true false > false -false > true 
yAtrue => y y^ false > false ọ V false > ọ y V true > true 


If (M,w) = ọ holds then M is a model of », ọ is true at w in M and M 
satisfies p. A modal formula y is satisfiable iff there exists a Kripke structure 
M and a world w in M such that (M, w) = ¢. 

We are interested in extensions of K with the modal axioms shown in Table 1 
and their combinations. Each of these axioms defines a class of Kripke frames 
where the accessibility relation R satisfies the first-order property stated in the 
table. Combinations of axioms then define a class of Kripke frames where the 
accessibility relation satisfies the combination of their corresponding properties. 

Given a normal modal logic L with corresponding class of frames §, we say 
a modal formula vy is L-satisfiable iff there exists a frame F € §, a valuation V 
and a world w € F such that (F,V,w) — y. It is L-valid or valid in L iff for 
every frame F € §, every valuation V and every world w € F, (F, V, w) Fy. A 
normal modal logic Lz is an extension of a normal modal logic Lı iff all Lı-valid 
formulae are also Lə-valid. 

A rooted Kripke structure M = (W, R, V, wo) is a rooted tree Kripke structure 
iff R is a tree, that is, a directed acyclic connected graph where each node has at 
most one predecessor, with root wo. It is a rooted tree Kripke model of a modal 
formula y iff (W, R, V, wo) = ọ. In a rooted tree Kripke structure with root wo 
for every world wg € W there is exactly one path connecting wo and wg, the 
length of that path is the modal level of wp (in M), denoted by mlys(wx). 

It is well-known [17] that a modal formula y is K-satisfiable iff there is a 
finite rooted tree Kripke structure M = (F, V, wo) such that (M, wo) = ¢. 

For the reductions presented in the next section we assume that any modal 
formula y has been simplified by exhaustively applying the rewrite rules in 
Table 2, and it is in Negation Normal Form (NNF). That is, a formula where 
only propositional symbols are allowed in the scope of negations. We say that 
such a formula is in simplified NNF. 

The reductions produce formulae in a clausal normal form, called Separated 
Normal Form with Sets of Modal Levels SNF,,,,;, introduced in [29]. The language 
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of SNF,,,,, extends that of the basic modal logic K with sets of modal levels as 
labels. Clauses in SNF,,,,, have one of the following forms: 


S: Vili S$: S: — ol 
(literal clause) (positive modal clause) (negative modal clause) 


where S C N and J, l’, l; are propositional literals with 1 < i < n, n € N. We 
write x : y instead of N : y and such clauses are called global clauses. Positive 
and negative modal clauses are together known as modal clauses. 

Given a rooted tree Kripke structure M and a set S of natural numbers, 
by M[S] we denote the set of worlds that are at a modal level in S, that is, 
M{S] = {w € W | mim(w) € S}. Then 


MES: y iff (M, w) — ¢ for every world w € M[S]. 


The motivation for using a set S to label clauses is that in our reductions 
the formula y may hold at several levels, possibly an infinite number of levels. 
It therefore makes sense to label such formulae not with just a single level, but 
a set of levels. The Separated Normal Form with Modal Level, SNF, can be 
seen as the special case of SNF mı where all labelling sets are singletons. 

Note that if S =@, then M } S : ¢ trivially holds. Also, a Kripke structure 
M can satisfy S : false if there is no world w with mly(w) € S. On the other 
hand, S$ : false with 0 € S is unsatisfiable as a rooted tree Kripke structure 
always has a world with modal level 0. 

If MES: y, then we say that S: p holds in M or is true in M. For a set 
® of labelled formulae, M H @ iff M H S : » for every S : y in &, and we say & 
is K-satisfiable. 

We introduce some notation that will be used in the following. Let S* = 
{l+1EN|leES},S- = {l-1EN|le S}, and S2 = {n | n > min(S)}, where 
min(S) is the least element in S. Note that the restriction of the elements being 
in N implies that S~ cannot contain negative numbers. 


3 Extensions of K 


In this section we define reductions from all the logics in the modal cube to 
SNF,,,,7- We assume that the set P of propositional symbols is partitioned into 
two infinite sets Q and T such that Q contains the propositional symbols of 
the modal formula y under consideration, and T surrogate symbols ty for every 
subformula w of y and supplementary propositional symbols. In particular, for 
every modal formula Ww we have var(v) C Q and there exists a propositional sym- 
bol ty € T uniquely associated with 7. These surrogate symbols serve the same 
purpose as Tseitin variables [40] and Skolem predicates [30,39] in the transfor- 
mation of propositional and first-order formulae, respectively, to clausal form via 
structural transformation. 

It turns out that given a reduction pps for KX with {D,T}M X = Q, there 
is a uniform and straightforward way we can obtain a reduction for KDX and 
KT’ from pks. Also, the valid formulae of KDT’ are the same as those of 
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Table 3. Categorisation of modal logics in the modal cube 


‘Base logics’ K |KB  K4 K5 | KB4/ K45 
Extensions with D | KD KDB KD4 KD5 KD45 
Extensions with T | KT | KTB | KT4 | KT5 


KT’, so we do not need to consider the case of adding both axioms to KY. 
Similarly, the logics KT45, KDB4, KTB4 and KT5 all have the same set of valid 
formulae. Therefore, as shown in Table 3, we can divide the 15 modal logics into 
three categories: Six ‘base logics’, five modal logics obtained by extending a ‘base 
logic’ with D, and a further four modal logics obtained by extending a ‘base logic’ 
with T. For four of the six ‘base logics’ (namely, K, KB, K4, and K5) we have 
already devised reductions in [29], so only two (i.e., KB4 and K45) remain. 

Given a modal formula y in simplified NNF and L = KS with X C 
{B,D,T,4,5}, we can obtain a set ®z of clauses in SNF,,,,, such that ọ is 
L-satisfiable iff z is K-satisfiable with Bz = p3"'(y~) = {{0} : tp} U pr({0} : 
tp — p), where pz is defined as follows: 


pPL(S : t —> true) = 0 
pL(S : t —> false) = {S : ~nt} 
PLCS : t => (1 A2)) = {8 : at V n(y1), S : =t V n(p2)} U 61 (5, Y1) U ôL (S, p2) 
pr(S it y) = {S:t Vy} 


if w is a disjunction of literals 


PLCS : t > (1 V 2)) = {S : =t V n(v1) V n(Ha)} U 41 (5, Y1) U L (S, p2) 
if Yı V we is not a disjunction of literals 
pr(S it + OW) = {5 : t > On(y)} U5z(S*,¥) 
PL(S : t > OY) = Pi (S:t > Oy) U Az (S : t > Dy) 


7 and ôr are defined as follows: 


Jw, if wis a literal _ JÓ, if w is a literal 
a= fe otherwise SelB) = pLi(S:ty— Y), otherwise 


and functions Pr, and Az, are defined as shown in Table 4. 

We can see in Table 4 that the reduction for KB4 has an additional SNF,,,., 
clause * : toy V tort, 4 that occurs neither in the reduction for KB nor in that for 
K4. It can be seen as an encoding of the derived axiom OOwW — Ow that follows 
from the contrapositive OOw — w of B and 4 Oy’ > OOy’. 

For K45 we see that all the SNF,,,,, clauses in the reduction for K5 carry over. 
These clauses are already sufficient to ensure that, semantically, if toy is true at 
any world at a level other than 0, then toy is true at every world. Consequently, 
to accommodate axiom 4, it suffices to add the SNF „m; clause {0} : toy > Otoy 
to ensure that this also holds for the root world at level 0. 


Local Reductions for the Modal Cube 493 


L P(S: toy > w) AL(S: toy > p) 
KS: tay > 0) 5 (S*,¥) 
KB S : toy > n(w), ÖL(ST UST, Y) 
S38 nlp) V tontoys SS”: tonto y = “toy 
K4 S2: toy > On(w), SZ : toy > Otoy br ((ST)=, Y) 
K5 ait a n(w), OL (x, w) 
* tots, V toy, * i totoy Ot wo 
| x: totoy — “toy, *: totoy — totoy 
KB4 | x: toy > On(v), Ot (x, Y) 
x: np) Vet ato * i toy V tontoy, 
*: tonto, => Otay, *: toy > Otoy 
K45 |x: tay > On(w), {0} : toy > Otoy iff 0 € S,iôz (x, %) 
* i totay V toy, KE Cota, Ot ws 
Kt totoy =>} “toy, *: totoy = totoy 
KDX {lbks(9) : toy > On(Y)} U Pas (S : toy > Op) [dr lbks,(S), v) 
KTX|{ibks(8) : —toy V n(Y) U Peo (S$: toy > OY) |óL(lbks(S)U S, 4) 


where lbk „~ and lbẸ 5, are defined as follows 


Table 4. Reduction of O-formulae, X C {B, 4,5}. 


L K KB K4 K5 KB4 K45 
Ibe (S) |S S SŽ x x x 
13 (S) | St STUSt|(S+) |x x x 


For reductions of KDX and KTX we have favoured the reuse of reductions 
for KX, KD and KT over optimisation for specific logics. For example, take KBD. 
Given that in a symmetric model, every world w except the root world wo has 
an R-successor, the axiom D only ‘enforces’ that wo also has an R-successor. So, 
instead of adding a clause S : toy — Ow for every clause S : toy —> On(w) we 
could just add {0} : tay —> Ow iff 0 € S. Similarly, in KT5, because of 5, for all 
worlds w except wo we already have w Rw. So, we could again {0} : atoy V7(~) 
for every clause S : toy —> On(w) iff 0 € S. 

For the KB4-unsatisfiable formula %ı = (ap A OOUp), if we were to inde- 
pendently apply the reductions for KB and K4, that is, we compute {{0} : 
ty, }U pKB({0} : ty, > Y1) Upxa({O} : tu, > Y1), then the result is the following 
set of clauses 4: 


{0} : ty, (6) {2}=: top >Op = (8) {1} : pV tonta, 


(1) 

(2) {0} +g, Vp (7) {2} : top > Otop (9) {1}: tonto, top 
(3) {0} : sty, V toonp 
(4) 
(5) 


{0} : toonp > Otoup 
{1} : toup —} Otap 


Clauses (1) to (5) stem from the transformation of pı to SNF mı for K, 
Clauses (6) and (7) stem from the reduction for 4 and Clauses (8) and (9) stem 
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from the reduction for B. This set of SNF,,,,, clauses is K-satisfiable. The clauses 
imply {1} : p, but neither {1} : Op nor {0} : p which we need to obtain a 
contradiction. Part of the reason is that we would need to apply the reduction 
for 4 and B recursively to newly introduced surrogates for O-formulae which 
in turn leads to the introduction of further surrogates and problems with the 
termination of the reduction. 

In contrast, the clause set 2 obtained by our reduction for KB4 is: 


(10) {0} : ty, (15) x: top > Op (17) x: pV tonsta, 

(11) {0} : aty, V ap (16) x : top > Otop (18) x: to-t,, > Ortop 
(12) {0} ; tap, V tooup (19) x: top V tonto, 

(13) {0} : tooup — Otoup (20) x: to top to top 
(14) {1} : toop =% Stop 


Note Clauses (19) and (20) in S2 for which there are no corresponding clauses 
in &,. Also, the set of labels of Clauses (15) to (18) are strict supersets of those 
of the corresponding Clauses (6) to (9). 2 implies both {1} : Op and {0} : p. 
The latter, together with Clauses (10) and (11), means ®2 is K-unsatisfiable. 


Theorem 1. Let y be a modal formula in simplified NNF, X C {B,D,T,4,5}, 
and Bks = pxe'(y). Then p is KY-satisfiable iff Bks is K-satisfiable. 


Proof (Sketch). For |X| < 1 this follows from Theorem 5 in [29]. 

For K45, KB4, KD”, and KTZ” with ©” C {B,4,5} we proceed in analogy 
to the proofs of Theorems 3 and 4 in [29]. Let L be one of these logics. 

To show that if y is L-satisfiable then ®zr is K-satisfiable, we show that 
given a rooted L-model M of p a small variation of the unravelling of M is a 
rooted tree K-model Mr of 8z. The main step is to define the valuation of the 
additional propositional symbols ty so that we can prove that all clauses in ®z 
hold in M L. To show that if ®; is K-satisfiable then y is L-satisfiable, we take a 
rooted tree K-model M = (W, R, V, wo) of ®; and construct a Kripke structure 
Mgr = (W, R}, V, wo). The relation R” is the closure of R under the relational 
properties associated with the axioms of L. The proof that Mz is a model of y 
relies on the fact that the clauses in @; ensure that for subformulae Ow of p, p 
will be true at all worlds reachable via R’ from a world where Ow is true. 


4 From SNF, to SNF: 


As KgP does not support SNF mı, in our evaluation of the effectiveness of the 
reductions defined in Sect.3, we have used a transformation from SNF,,,,; to 
SNF,- An alternative approach would be to reflect the use of SNF mı in the 
calculus and re-implement the prover. Whilst we believe that redesigning the 
calculus presents few problems, re-implementing KgP needs more thought in 
particular how to represent infinite sets. The route we adopt here allows us to 
experiment with the approach in general without having to change the prover. 
For extensions of K with one or more of the axioms B, D, T such a transformation 
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Table 5. Bounds on the length of prefixes in SST tableaux 


Logic L Bound dbf 

K,KD,KT, KB,KDB,KTB | 1 + d£, 

K4,S4 24 d§&+n$ x nÉ 

KD4 2+ dg + (max(1, n£) x né) 
KB4,KTB4, K5,S5,K45 2+ dS +n¥ 

KD5 2+ dS + max(1, n£) 


is straightforward as the sets of modal levels occurring in the normal form of 
modal formulae are all finite. Thus, instead of a single SNF „m; clause S : aty V 
nr (q) we can use the finite set of SNF, clauses {ml : sty V n¢(q) | ml € S}. 

For extensions of K with at least one of the axioms 4 and 5, potentially 
together with other axioms, the sets of modal levels labelling clauses are in 
general infinite. For each logic L it is, however, possible to define a computable 
function that maps the modal formula y under consideration onto a bound db? 
such that, restricting the modal levels in the normal form of y by db? , preserves 
satisfiability equivalence. 

To establish the bound and prove satisfiability equivalence, we need to intro- 
duce the basic notions of Single Step Tableaux (SST) calculi for a modal logic L 
[14,21], which uses sequences of natural numbers to prefix modal formulae in a 
tableau. The SST calculus consists of a set of rules, with the (7) rule being the 
only rule increasing prefixes’ lengths (i.e., o : Op/o.n : p with o.n new on the 
branch). For a logic L, an L-tableau T in the SST calculus for a modal formula 
y is a (binary) tree where the root of T. is labelled with 1 : y, and every other 
node is labelled with a prefixed formula o : w obtained by application of a rule 
of the calculus. A branch B is a path from the root to a leaf. A branch B is closed 
if it contains either false or a propositional contradiction at the same prefix. A 
tableau ”T is closed if all its branches are closed. A prefixed formula ø : w is 
reduced for rule (r) in B if the branch B already contains the conclusion of such 
rule application. By a systematic tableau construction we mean an application 
of the procedure in [14, p. 374] adapted to SST rules. 

For each logic L, we establish its bound by considering an L-SST calculus, 
where a modal level in an SNF,,,,, clause corresponds to the length of a prefix in 
an SST tableau. The bound then either follows from an already known bound 
on the length of prefixes in an SST tableau preserving correctness of the SST 
calculus, or we establish such a bound ourselves. To prove satisfiability equiva- 
lence, we show that, for a closed SST tableau with such a bound on the length 
of prefixes in place, we can construct a resolution refutation of a set of SNF m 
or SNF, clauses with a corresponding bound on modal levels in those clauses. 

For a modal formula y in simplified NNF let d¥, be the modal depth of 
y, dS be the maximal nesting of ©-operators not under the scope of any 
operators in y, nÉ be the number of O-subformulae in y, and n$ be the number of 
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©-subformulae below O-operators in y. Our results for the bounds on the length 
of prefixes in SST tableaux can then be summarised by the following theorem. 


Theorem 2. Let L = KX with X C {B,D,T,4,5}. A systematic tableau con- 
struction of an L-tableau for a modal formula p in simplified NNF under the 
following Constraints (TC1) and (TC2) 


(TC1) a rule (r) of the SST calculus is only applicable to a prefixed formula 
a: wW in a branch B if the formula is not already reduced for (r) in B; 

(TC2) rule (x) of the SST calculus is only applicable to prefixed formulae o : Ow 
with |o| < db? for db? as defined in Table 5 


terminates in one of following states: 


(1) all branches of the constructed tableau are closed and ọ is L-unsatisfiable or 
(2) at least one branch B is not closed, no rule is still applicable to a labelled 
formula in B, and ọ is L-satisfiable. 


The proof is analogous to Massacci’s [21, Section B.2]. Note that for logics KD4 
and KD5, we use max(1, n$) in the calculation of the bound. That is, if ng > 1 
then max(1,n) = n$ and the bound is the same as for K4 and K5. Otherwise 
max(1,n$) = 1, that is, the bound is the same as for a formula with a single 
©-subformula below O-operators in y. 

For K, KD, KT, KB and KDB these bounds were already stated in [21, Tables 
III and IV]. The bound for KTB follows straightforwardly from that for KB and 
KDB. For KD4, Massacci [21, Tables III and IV] states the bound to be the 
same as for K4. However, this is not correct for the case that the formula y 
contains no ©-formulae, where its bound would simply be 2, independent of yp. 
For example, the formula false which is KD4-unsatisfiable, does not have 
a closed KD4-tableau with this bound. For the other logics the bounds are new. 
As argued in [21], the bounds allow tableau decision procedures for extensions 
of K with axioms 4 and 5 that do not require a loop check and are therefore of 
wider interest. 

Note that in KT4, 00% and Oy are equivalent and so are O(WAOV) and O(wA 
0). So, it makes sense to further simplify KT4 formulae using such equivalences 
before computing the normal form and the bound with the benefit that it may not 
only reduce the bound but also the size of the normal form. Similar equivalences 
that can be used to reduce the number of modal operators in a formula also 
exist for other logics, see, e.g., [8, Chapter 4]. 

To establish a relationship between closed tableaux and resolution refuta- 
tions of a set of SNF,,, clauses, we formally define the modal layered resolution 
calculus. Table6 shows the inference rules of the calculus restricted to labels 
occurring in our normal form. For GEN1 and GEN3, if the modal clauses in 
the premises occur at the modal level ml, then the literal clause in the premises 
occurs at modal level ml + 1. 
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Let & be a set of SNF; clauses. A (resolution) derivation from ® is a sequence 
of sets Po, B1, ... where Bo = @ and, for each i > 0, B41, = Pi U {D}, where 
D ¢ ®; is the resolvent obtained from ®; by an application of one of the inference 
rules to premises in &;. A (resolution) refutation of ® is a derivation ®o,..., Bk, 
k € N, where 0: false € @x. 

To map a set of SNF,,,,, clauses to a set of SNF,,,, clauses, using a bound 
n € N on the modal levels, we define a function db, on clauses and sets of 
clauses in SNF „z as follows: 


dbn(S: vy) = {ml:y| ml € S and ml <n} 
db, (8) = Us.vee dbn (S : p) 


Note that prefixes in SST-tableaux have a minimal length of 1 while the 
minimal modal level in SNF,,,, clauses is 0. So, a prefix of length n in a prefixed 
formula corresponds to a modal level n — 1 in an SNF; clause. 

The proof of the following theorem then takes advantage of the fact that we 
have surrogates and associated clauses for each subformula of y and proceeds 
by induction over applications of rule (7). 


Theorem 3. Let L = KY with X C {B,D,T,4,5}, p be a KX -unsatisfiable 
formula in simplified NNF, db? be as defined in Table 5, and Br = pi'(y) = 
dbg —1 (97 (9)). Then there is a resolution refutation of PL. 


Regarding the size of the encoding, we note that, ignoring the labelling sets, 
the reduction p$’”! into SNF,,,,, is linear with respect to the size of the original 
formula. The size including the labelling sets would depend on the exact repre- 
sentation of those sets, in particular, of infinite sets. As those are not arbitrary, 
there is still an overall polynomial bound on the size of the sets of SNF „m; clauses 
produced by pł™!. When transforming clauses from SNF „m; into SNF, we may 
need to add every clause to all levels within the bounds provided by Theorem 3. 
The parameters for calculating those bounds, d¥,, d8, nS, and né, are all them- 
selves linearly bound by the size of the formula. Thus, in the worst case, which 
is S4, the size of the clause set produced by p?” is bounded by a polynomial of 
degree 3 with respect to the size of the original formula. 

It is worth pointing out that both the reduction p#™ of a modal formula 
to SNF,,,,, and the reduction py’ to SNF,,,, are also reversible, that is, we can 
reconstruct the original formula from the SNF,,,,, and from the SNF, clause set 
obtained by p$™! or p'!, respectively. This reconstruction can also be performed 
in polynomial time. Thus the reduction itself does not affect the complexity 
of the satisfiability problem. For instance, the satisfiability problem for S5 is 
NP-complete and so is the satisfiability problem of the subclass Css of SNF,,,; 
clause sets that can be obtained as the result of an application of pee to a modal 
formula. However, a generic decision procedure for K will not be a complexity- 


optimal decision procedure for Css. 
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Table 6. Inference rules of the MLR calculus 


ml: i —> Oh 


ml: DVI ml:l, — Ol ml: l => OA 
IRES ml: D' Val MRES ml: lg — al GEN2 ml : l} 4 Ole 
“ml: DVD © ml: al, V ale “ml: al, Val V als 
ml: I, > OA ml: I, —> OA, 
ml: l, —> OAlm ml : l, > Olm 
ml: > O-l mi: — Ol 
ml+1:h4 V ...VIimVI ml+1:h V ...VIm 
GEN1 : i i ; GEN3 : 5 7 7 
ml: Al, V... V al, Val ml: aI, V.V aly V ol 


5 Evaluation 


An empirical evaluation of the practical usefulness of the reductions we presented 
in Sects. 3 and 4 faces the challenge that there is no substantive collection of 
benchmark formulae for the 15 logics of the modal cube except for basic modal 
logic. Catach [7] evaluates his prover on 31 modal formulae with a maximal 
length of 22 and maximal modal depth of 4. They are not sufficiently challeng- 
ing. The QMLTP Problem Library for First-Order Modal Logics [32] focuses on 
quantified formulae and contains only a few formulae taken from the research 
literature that are purely propositional and were not written for the basic modal 
logic K. The Logics Workbench (LWB) benchmark collection [2] contains formu- 
lae for K, KT and S4 but not for any of the other logics we consider. For each 
of these three logics, the collection consists of 18 parameterised classes with 21 
formulae each, plus scripts with which further formulae could be generated if 
needed. All formulae in 9 classes are satisfiable and all formulae in the other 9 
classes are unsatisfiable in the respective logic. 

In [29] we have used the 18 classes of the LWB benchmark collection for K 
to evaluate our approach for the six logics consisting of K and its extensions 
with a single axiom. One drawback of using these 18 classes for other modal 
logics is that formulae that are K-satisfiable are not necessarily KY-satisfiable 
for non-empty sets X of additional axioms. For example, for K5, only 60 out of 
180 K-satisfiable formulae were K5-satisfiable. Another drawback is that while 
K-unsatisfiable formulae are also KX’-unsatisfiable, a resolution refutation would 
not necessarily involve any of the additional clauses introduced by our reduction 
for KX’. It may be that the additional clauses allow us to find a shorter refutation, 
but it may just be a case of finding the same refutation in a larger search space. 
It is also worth recalling that simplification alone is sufficient to determine that 
all formulae in the class k_lin_p are K-unsatisfiable while pure literal elimination 
can be used to reduce all formulae in k_grz_p to the same simple formula [26]. 
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Table 7. Logic-specific modification of unsatisfiable benchmark formulae 


Logic Li? Logic Li? 

K false KD4 (Ogp A OO0O-7@p) 

KB (“dp A© Ip) K5 (Onqp AC Ip) 

KDB (~qp A OD((O=qp AOg;) V dp)) |KD5 | ((>gp A Ogp) V (OOg, A Ong) 
KTB  (=qp ACO((=q, ^ Oq) V qp)) ||K45  (Eqp A O0g), A OO(>Gp V 795) 
KD (O-gp ^ Oqp) KD45 = ((O-q, ^ Og,) A 

KT (Gp ^ Uap) (Ogp A O0g, A O(a V =4p)) 
K4 (Cap A OO~qp) S4 (>q, ^ O(=q, V Ogp) A OOn7@p) 
K4B  |(~qp A OOD Gp) S5 ((=Gp A Ogp) V (=a, A OOOUG,) 


Thus, some of the classes evaluate the preprocessing capabilities of a prover but 
not the actual calculus and its implementation. 

We therefore propose a different approach here. The principles underlying 
our approach are that (i) there should be the same number of formulae for 
each logic though not necessarily the same formulae across all logics; (ii) there 
should be an equal number of satisfiable and unsatisfiable formulae for each logic; 
(iii) a formula that is L-unsatisfiable should only be L’-unsatisfiable for every 
extension L’ of L; (iv) a formula that is L’-satisfiable should be L-satisfiable 
for every extension L’ of L; (v) the formulae should belong to parameterised 
classes of formulae of increasing difficulty. Note that Principles (iii) and (iv) are 
intentionally not symmetric. For L-unsatisfiable formulae it should be necessary 
for a prover to use the rules or clauses specific to L instead of being able to find 
a refutation without those. For L-satisfiable formulae we want to maximise the 
search space for a model. 

For unsatisfiable formulae, we take the five LWB classes k_branch_p, 
k_path_p, k_ph_p, k_poly_p, k_t4p_p and for each logic L in the modal cube 
transform each formula in a class so that is L-unsatisfiable, but L’-satisfiable for 
any logic L’ that is not an extension of L. The transformation proceeds by first 
converting a formula y to simplified NNF. Then for each propositional literal | 
it replaces all its occurrences by (LV 4? ) where |I| = p and #7 is a modal formula 
uniquely associated with p and L, resulting in a formula vy’. Finally, for logics 
KD4 and KDB we need to add a disjunct (Og A O~q) to vy’, while for logics $4 
and KTB we need to add a disjunct (¢A\ O-q), where q is a propositional symbol 
not occurring in y’. These disjuncts are unsatisfiable in the respective logics but 
satisfiable in logics where D, or T, do not hold. Table7 shows the formulae Y? 
that we use in our evaluation. In the table, qp and dp are propositional variables 
uniquely associated with p that do not occur in y. The overall effect of this 
transformation is that the resulting classes of formulae satisfy Principles (iii) 
and (v). 

For satisfiable formulae, we use the five classes k-poly-n, s4-md-n, s4-ph-n, 
s4_path_n, s4_s5_n without modification. Although the first of these classes was 
designed to be K-satisfiable and the other four to be S4-satisfiable, the formulae 
in those classes are satisfiable in all the logics we consider. s4_ipc_n also consists 


500 C. Nalon et al. 


Table 8. Benchmarking results 


Logic | Status | Total GMR| GMR; GMR|| R+MLR|} R+MLR R+MLR | LEO- 

(cneg)| (cord) | (cplain) (cneg) (cord) (cplain) TI+E 
K S 100 84 85 77 100 100 100 0 
KD S 100 84 85 77 96 100 93 0 
KT S 100 70 81 50 66 68 61 0 
KB S 100 58 58 29 51 64 51 0 
K4 S 100 83 85 77 56 57 50 0 
K5 S 100 67 60 45 36 37 26 0 
KDB |S 100 63 70 40 56 73 55 0 
KTB |S 100 58 59 38 52 57 31 0 
KD4 |S 100 83 85 77 52 53 46 0 
KD5 |S 100 73 70 61 46 47 38 0 
K45 |S 100 45 53 34 36 37 25 0 
K4B |S 100 18 19 11 23 38 15 0 
KD45 |S 100 67 66 56 46 47 38 0 
S4 S 100 66 76 48 45 44 33 0 
S5 S 100 32 28 32 32 35 24 0 
All S 1500 951 980 752 793 857 686 0 
K U 100 74 76 71 79 78 77 21 
KD |U 100 74 76 71 73 75 62 13 
KT |U 100 74 77 70 71 74 67 30 
KB U 100 71 78 68 71 52 55 10 
K4 U 100 55 52 57 41 29 35 4 
K5 U 100 74 46 75 50 30 48 8 
KDB |U 100 73 77 71 73 52 56 8 
KTB |U 100 72 77 69 67 50 53 9 
KD4 |U 100 70 59 67 40 32 39 1 
KD5 |U 100 75 46 TT 51 40 46 3 
K45 |U 100 51 37 49 16 12 8 3 
K4B |U 100 47 52 46 53 30 49 5 
KD45 | U 100 64 43 55 33 22 28 1 
S4 U 100 47 68 66 45 21 23 4 
S5 U 100 47 51 52 36 13 29 2 
All U 1500 968 915 964 799 610 675 122 


only of S5-satisfiable formulae but these appear to be insufficiently challenging 
and have not been included in our benchmark set. All other classes of the LWB 
benchmark classes for K and S4 are satisfiable in some of the logics, but not 
in all. The five classes satisfy Principles (iv) and (v). The benchmark collection 
consisting of all ten classes together then also satisfies Principles (i) and (ii). 
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Another challenge for an empirical evaluation is the lack of available fully 
automatic theorem provers for all 15 logics that we have already discussed in 
Sect. 1. This leaves us with just three different approaches we can compare (i) the 
higher-order logic prover LEO-III [12,37], with E 2.6 as external reasoner, LEO- 
UI+E for short, that supports a wide range of logics via semantic embedding 
into higher-order logic (ii) the combination of our reductions with the modal- 
layered resolution (MLR) calculus for SNF „z clauses [25], R+MLR calculus for 
short, implemented in the modal theorem prover KgP (iii) the global modal res- 
olution (GMR) calculus, implemented in KgP, which has resolution rules for 
all 15 logics [24]. For R+ MLR and GMR calculi, resolution inferences between 
literal clauses can either be unrestricted (cplain option), restricted by nega- 
tive resolution (cneg option), or restricted by an ordering (cord option). It is 
worth pointing out that negative and ordered resolution require slightly dif- 
ferent transformations to the normal form that introduce additional clauses 
(snf+ and snf++ options, respectively). Also, the ordering cannot be arbi- 
trary [25]. For the experiments, we have used the following options: (i) input 
processing: prenexing, together with simplification and pure literal elimination 
(bnfsimp, prenex, early_ple); (ii) preprocessing of clauses: renaming reuses 
symbols (limited_reuse_renaming), forward and backward subsumption (fsub, 
bsub) are enabled; the usable is populated with clauses whose maximal literal is 
positive (populate usable, max_lit_positive); pure literal elimination is set 
for GMR (ple) and modal level ple is set for MLR (ml1ple); (iii) processing: infer- 
ence rules not required for completeness are also used (unit, lhs_unit,mres), 
the options for preprocessing of clauses are kept and clause selection takes the 
shortest clause by level (shortest). 

For LEO-III we provide the prover with a modal formula in the syntax it 
expects plus a logic specification that tells the prover in which modal logic 
the formula is meant to be solved, for example, $modal_system_S4. LEO-III 
can collaborate with external reasoners during proof search and we have used 
E 2.6 [34,35] as external reasoner and restricted LEO-III to one instance of E 
running in parallel. LEO-III is implemented in Java and we have set the maxi- 
mum heap size to 1 GB and the thread stack size to 64 MB for the JVM. 

Table 8 shows our benchmarking results. The first three columns of the table 
show the logic in which we determine the satisfiability status of each formula, 
the satisfiability status of the formulae, and their number. The next six columns 
then show how many of those formulae were solved by KgP with a particular 
calculus and refinement. The last column shows the result for LEO-III. The 
highest number or numbers are highlighted in bold. A time limit of 100 CPU 
seconds was set for each formula. Benchmarking was performed on a PC with 
an AMD Ryzen 5 5600X CPU @ 4.60 GHz max and 64GB main memory using 
Fedora release 34 as operating system. 

While the R+MLR calculus is competitive with GMR on extensions of K 
with axioms D, T and, possibly, B, the GMR calculus has better performance 
on extensions with axioms 4 and 5. 

On satisfiable formulae, where for all logics we use exactly the same formulae 
and both resolution calculi have to saturate the set of clauses up to redundancy, 
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the number of formulae solved is directly linked to the number of inferences 
necessary to do so. The fact that we reduce SNF,,,,, clauses to SNF, clauses via 
the introduction of multiple copies of the same clausal formulae with different 
labels clearly leads to a corresponding multiplication of the inferences that need 
to be performed. LEO-IIJ+E does not solve any of the satisfiable formulae. This 
can be seen as an illustration of how important the use of additional techniques 
is that can turn resolution into a decision procedure on embeddings of modal 
logics into first-order logic [18,33]. 

On unsatisfiable formulae, where we use different formulae for each logic, 
the number of formulae solved is linked to the number of inferences it takes to 
find a refutation. For instance, on K it takes the GMR calculus on average 6.2 
times the number of inferences to find a refutation than the R+MLR calculus. 
However, for all other logics the opposite is true. On the remaining 14 logics, the 
R+MLR calculus on average requires 6.5 times the number of inferences to find 
a refutation than the GMR calculus. Given that the R+MLR calculus currently 
uses a reduction from a modal logic to SNF,,,,, followed by a transformation 
from SNF m; to SNF,,,,, it is difficult to discern which of the two is the major 
problem. It is clear that multiple copies of the same clausal formulae are also 
detrimental to proof search. LEO-III+E does reasonably well on unsatisfiable 
formulae and the results clearly show the impact that additional axioms have on 
its performance. It performs best for KT and K but for logics involving axioms 
4 and 5 very few formulae can be solved. The external prover E finds the proof 
for 121 out of the 122 modal formulae LEO-II+E can solve. 


6 Conclusions 


We have presented novel reductions of extensions of the modal logic K with 
arbitrary combinations of the axioms B, D, T, 4, 5 to clausal normal forms 
SNF,,,,,; and SNF,,,, for K. The implementation of those reductions combined 
with KP [26], allows us to reason in all 15 logics of the modal cube in a fully 
automatic way. Such support was so far extremely limited. 

The transformation of sets of SNF m, to sets of SNF; relies on new results 
that show that non-clausal closed tableaux in the Single Step Tableaux calculus 
[14,21] can be simulated by refutations in the modal-layered resolution (MLR) 
calculus for SNF, clauses [25]. 

We have also developed a new collection of benchmark formulae that covers 
all 15 logics of the modal cube. The collection consists of classes of parameterised 
and therefore scalable formulae. It contains an equal number of satisfiable and 
unsatisfiable formulae for each logic and the satisfiability status of each formula is 
known in advance. So far extensive collections of benchmark formulae were only 
available for K with smaller collections available for KT and S4. A key feature 
of the approach is that it uses the systematic modification of K-unsatisfiable 
formulae to obtain unsatisfiable formulae in other logics. Thus, we could obtain 
a more extensive collection by applying this approach to further collections of 
benchmark formulae for K. 
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The evaluation we presented shows that on most of the 15 modal logics the 
combination of our reduction to SNF,,,, with the MLR calculus does not per- 
form as well as the global modal resolution (GMR) calculus, also implemented 
in KgP. This contrasts with the evaluation in [29], where we only considered six 
logics and used a different collection of benchmarks. We believe that the new 
benchmark collection more clearly indicates weaknesses in the current approach, 
in particular, the reduction from SNF sm; to SNF,,,;. It is possible that the imple- 
mentation of a calculus that operates directly on sets of SNF m; clauses would 
perform considerably better as it avoids the repetition of clauses with different 
labels. However, it does so by using potentially infinite sets of labels which makes 
an implementation challenging. We intend to explore this possibility in future 
work. 
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reasoning via proof search. We show that previously proposed sequent 
systems are cut-free incomplete for basic validities from Kleene Algebra 
(KA) and Propositional Dynamic Logic (PDL), over standard transla- 
tions. On the other hand, our system faithfully simulates known cyclic 
systems for KA and PDL, thereby inheriting their completeness results. 
A peculiarity of our system is its richer correctness criterion, exhibiting 
‘alternating traces’ and necessitating a more intricate soundness argu- 
ment than for traditional cyclic proofs. 
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1 Introduction 


Transitive Closure Logic (TCL) is the extension of first-order logic by an operator 
computing the transitive closure of definable binary relations. It has been studied 
by numerous authors, e.g. [15-17], and in particular has been proposed as a 
foundation for the mechanisation and automation of mathematics [1]. 

Recently, Cohen and Rowe have proposed non-wellfounded and cyclic sys- 
tems for TCL [9,11]. These systems differ from usual ones by allowing proofs to 
be infinite (finitely branching) trees, rather than finite ones, under some appro- 
priate global correctness condition (the ‘progressing criterion’). One particular 
feature of the cyclic approach to proof theory is the facilitation of automation, 
since complexity of inductive invariants is effectively traded off for a richer proof 
structure. In fact this trade off has recently been made formal, cf. [3,12], and 
has led to successful applications to automated reasoning, e.g. [6,7,24,26,27]. 

In this work we investigate the capacity of cyclic systems to automate reason- 
ing in TCL. Our starting point is the demonstration of a key shortfall of Cohen 
and Rowe’s system: its cut-free fragment, here called TCg, is unable to cyclically 
prove even standard theorems of relational algebra, e.g. (a U b)* = a*(ba*)* and 
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(aa U aba)t < at((bat)* Ua)) (Theorem 12). An immediate consequence of 
this is that cyclic proofs of TCg do not enjoy cut-admissibility (Corollary 13). 
On the other hand, these (in)equations are theorems of Kleene Algebra (KA) 
[18,19], a decidable theory which admits automation-via-proof-search thanks to 
the recent cyclic system of Das and Pous [14]. 

What is more, TCL is well-known to interpret Propositional Dynamic Logic 
(PDL), a modal logic whose modalities are just terms of KA, by a natural exten- 
sion of the ‘standard translation’ from (multi)modal logic to first-order logic (see, 
e.g., [4,5]). Incompleteness of cyclic-TCg for PDL over this translation is inher- 
ited from its incompleteness for KA. This is in stark contrast to the situation for 
modal logics without fixed points: the standard translation from K (and, indeed, 
all logics in the ‘modal cube’) to first-order logic actually lifts to cut-free proofs 
for a wide range of modal logic systems, cf. [21,22]. 

A closer inspection of the systems for KA and PDL reveals the stumbling 
block to any simulation: these systems implicitly conduct a form of ‘deep infer- 
ence’, by essentially reasoning underneath J and ^. Inspired by this observation, 
we propose a form of hypersequents for predicate logic, with extra structure 
admitting the deep reasoning required. We present the cut-free system HTC and 
a novel notion of cyclic proof for these hypersequents. In particular, the incorpo- 
ration of some deep inference at the level of the rules necessitates an ‘alternating’ 
trace condition corresponding to alternation in automata theory. 

Our first main result is the Soundness Theorem (Theorem 23): non- 
wellfounded proofs of HTC are sound for standard semantics. The proof is rather 
more involved than usual soundness arguments in cyclic proof theory, due to the 
richer structure of hypersequents and the corresponding progress criterion. Our 
second main result is the Simulation Theorem (Theorem 28): HTC is complete 
for PDL over the standard translation, by simulating a cut-free cyclic system 
for the latter. This result can be seen as a formal interpretation of cyclic modal 
proof theory within cyclic predicate proof theory, in the spirit of [21,22]. 

To simplify the exposition, we shall mostly focus on equality-free TCL and 
‘identity-free’ PDL in this paper, though all our results hold also for the ‘reflexive’ 
extensions of both logics. We discuss these extensions in Sect. 7, and present 
further insights and conclusions in Sect. 8. Full proofs and further examples not 
included here (due to space constraints) can be found in [13]. 


2 Preliminaries 


We shall work with a fixed first-order vocabulary consisting of a countable set 
Pr of unary predicate symbols, written p,q, etc., and of a countable set Rel of 
binary relation symbols, written a,b, etc. We shall generally reserve the word 
‘predicate’ for unary and ‘relation’ for binary. We could include further relational 
symbols too, of higher arity, but choose not to in order to calibrate the semantics 
of both our modal and predicate settings. 

We build formulas from this language differently in the modal and predicate 
settings, but all our formulas may be formally evaluated within structures: 
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Definition 1 (Structures). A structure M consists of a set D, called the 
domain of M, which we sometimes denote by |M|; a subset pM C D for each 
p € Pr; and a subset a“ C D x D for each a € Rel. 


2.1 Transitive Closure Logic 


In addition to the language introduced at the beginning of this section, in the 
predicate setting we further make use of a countable set of function symbols, 
written f*,g’, etc. where the superscripts i,j € N indicate the arity of the 
function symbol and may be omitted when it is not ambiguous. Nullary function 
symbols (aka constant symbols), are written c,d etc. We shall also make use of 
variables, written x,y, etc., typically bound by quantifiers. Terms, written s, t, 
etc., are generated as usual from variables and function symbols by function 
application. A term is closed if it has no variables. 

We consider the usual syntax for first-order logic formulas over our language, 
with an additional operator for transitive closure (and its dual). Formally, TCL 
formulas, written A, B, etc., are generated as follows: 


A, B ::= p(t) | p(t) | a(s, t) 
TC (Aa, y.A)(s,t) 


When variables x,y are clear from context, we may write TC(A(a, y))(s,t) or 
TC(A)(s,t) instead of TC(Az, y.A)(s,t), as an abuse of notation, and similarly 
for TC. We may write A[t/z] for the formula obtained from A by replacing every 
free occurrence of the variable x by the term t. We have included both TC and 
TC as primitive operators, so that we can reduce negation to atomic formulas, 


shown below. This will eventually allow a one-sided formulation of proofs. 


a(s,t) | (AA B) 


| | (AV B) | YzA | xA | 
| TC(Aa, y.A)(s, t) 


Definition 2 (Duality). For a formula A we define its complement, A, by: 


— en ao mae AaB :=AvB TOALG,# := TO(A)(s,t) 
alst) = aei JxA = Yed AVB := ANB TC(A)(s,t) := TC(A)(s,t) 


We shall employ standard logical abbreviations, e.g. A D B for AV B. 
We may evaluate formulas with respect to a structure, but we need additional 
data for interpreting function symbols: 


Definition 3 (Interpreting function symbols). Let M be a structure with 
domain D. An interpretation is a map p that assigns to each function symbol 
f” a function D” — D. We may extend any interpretation p to an action on 


(closed) terms by setting recursively p(f(ti,...,tn)) = p(f)(p(t1),---, p(tn))- 


We only consider standard semantics in this work: TC (and TC) is always 
interpreted as the real transitive closure (and its dual) in a structure, rather 
than being axiomatised by some induction (and coinduction) principle. 
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Definition 4 (Semantics). Given a structure M with domain D and an inter- 
pretation p, the judgement M, p = A is defined as usual for first-order logic with 
the following additional clauses for TC and TC :+ 


- M, p = TC (A(x, y))(s, t) if there are vo,...,Un41 E D with p(s) = vo, p(t) = 
Un+1, such that for every i < n we have M, p - A(vi, vi+1). 

- M,p — TO(A(z,y))(s,t) if for all vo,...,Un41 E D with p(s) = vo and 
p(t) = Un41, there is some i < n such that M, p = Alvi, vi41)- 


If M,p} A for all M and p, we simply write = A. 


Remark 5 (TC and TC as least and greatest fixed points). As expected, we 
have M, p K TC(A)(s,t) just if M, p | TC(A)(s,t), and so the two operators 
are semantically dual. Thus, TC and TC duly correspond to least and greatest 
fixed points, respectively, satisfying in any model: 


TC(A)(s,t) <> A(s,t) V 3e(A(s,2) A TC(A)(z, t)) (1) 
TC(A)(s,t) <> A(s,t) AV2(A(s, £) V TO(A)(z,t)) (2) 


Let us point out that our TC operator is not the same as Cohen and Rowe’s 
transitive ‘co-closure’ operator TC'°? in [10], but rather the De Morgan dual 
of TC. In the presence of negation, TC and TC are indeed interdefinable, cf. 
Definition 2. 


2.2 Cohen-Rowe Cyclic System for TCL 


Cohen and Rowe proposed in [9,11] a non-wellfounded system for TCL that 
extends a usual sequent calculus LK — for first-order logic with equality and 
substitution by rules for TC inspired by its characterisation as a least fixed 
point, cf. (1).? Note that the presence of the substitution rule is critical for the 
notion of ‘regularity’ in predicate cyclic proof theory. The resulting notions of 
non-wellfounded and cyclic proofs are formulated similarly to those for first-order 
logic with (ordinary) inductive definitions [8]: 


Definition 6 (Sequent system). TCG is the extension of LK= by the rules: 
T, A(s,t) T, A(s,r) I, TC(A)(r,t) 


Tog e PO: 
I, TOC(A)(s,t) T, TC(A)(s,t) 
me T, A(s,t) I, A(s,c), TC(A)(c,t) 


T, TO(A)(s,t) 


(3) 


c fresh 


TCg-preproofs are possibly infinite trees of sequents generated by the rules of 
TCg. A preproof is regular if it has only finitely many distinct sub-preproofs. 


1 Note that we are including ‘parameters from the model’ in formulas here. Formally, 
this means each v € D is construed as a constant symbol for which p(v) = v. 

? Cohen and Rowe’s system is originally called RTCg, rather using a ‘reflexive’ ver- 
sion RTC of the TC operator. However this (and its rules) can be encoded (and 
simulated) by defining RTC (Ax, y.A)(s,t) = TC(Aa, y(x = y V A))(s, t). 
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The notion of ‘correct’ non-wellfounded proof is obtained by a standard pro- 
gressing criterion in cyclic proof theory. We shall not go into details here, being 
beyond the scope of this work, but refer the reader to those original works (as 
well as [13] for our current variant). Let us write Feye for their notion of cyclic 
provability using the above rules, cf. [9,11]. A standard infinite descent counter- 
model argument yields: 


Proposition 7 (Soundness, [9,11]). If TCa eye A then = A. 


In fact, this result is subsumed by our main soundness result for HTC (Theo- 
rem 23) and its simulation of TCg (Theorem 19). In the presence of cut, a form 
of converse of Proposition 7 holds: cyclic TCg@ proofs are ‘Henkin complete’, 
i.e. complete for all models of a particular axiomatisation of TCL based on 
(co)induction principles for TC (and TC) [9,11]. However, the counterexample 
we present in the next section implies that cut is not eliminable (Corollary 13). 


3 Interlude: Motivation from PDL and Kleene Algebra 


Given the TCL sequent system proposed by Cohen and Rowe, why do we propose 
a hypersequential system? Our main argument is that proof search in TCG is 
rather weak, to the extent that cut-free cyclic proofs are unable to simulate a 
basic (cut-free) system for modal logic PDL (regardless of proof search strategy). 
At least one motivation here is to ‘lift’ the standard translation from cut-free 
cyclic proofs for PDL to cut-free cyclic proofs in an adequate system for TCL. 


3.1 Identity-Free PDL 


Identity-free propositional dynamic logic (PDL*) is a version of the modal logic 
PDL without tests or identity, thereby admitting an ‘equality-free’ standard 
translation into predicate logic. Formally, PDL* formulas, written A, B, etc., 
and programs, written a, 3, etc., are generated by the following grammars: 
A,B := p| P | (AA B) | (AV B) | [a]A | (a) A 
a, B ::= a | (a; 8) | (aU 8) | at 


We sometimes simply write aĝ instead of a; 8, and (a)A for a formula that is 
either (a) A or [a]A. 


Definition 8 (Duality). For a formula A we define its complement, A, by: 


es AVB := 


We evaluate PDL* formulas using the traditional relational semantics of 
modal logic, by associating each program with a binary relation in a structure. 
Again, we only consider standard semantics, in the sense that the + operator is 
interpreted as the real transitive closure within a structure. 
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Definition 9 (Semantics). For structures M with domain D, elements v € 
D, programs a and formulas A, we define a” C D x D and the judgement 
M,v = A as follows: 


- (a is already given in the specification of M, cf. Definition 1). 


- (a; 8)“ := {(u, v) : there is w E€ D s.t. (u,w) € a™ and (w,v) € BM}. 

- (aU B) := { (u,v) : (u,v) € a™ or (u,v) € 6M}. 

- (at) := {(u,v) there are Wo,---;Wn41 E Ds.t.u = wo,v = 
Wn+1 and, for every i <n, (wi, wi41) E€ a}. 

- M,v =p ifv E pm. 

- M,v EP ifv ¢ p™. 

~M,vE AAB if M,v H A and M, v EB. 

-M,v =AVBif M,vHAor M, vEB. 

- M,v | [ojA if Y (v, w) € a™ we have M, w } A. 

- M, v (a)A if 3 (v, w) € a™ with M,w = A. 

If M,v EA for all M and v € |M], then we write = A. 


Note that we are overloading the satisfaction symbol - here, for both PDLT 
and TCL. This should never cause confusion, in particular since the two notions 
of satisfaction are ‘compatible’ as we shall now see. 


3.2 The Standard Translation 


The so-called ‘standard translation’ of modal logic into predicate logic is induced 
by reading the semantics of modal logic as first-order formulas. We now give a 
natural extension of this that interprets PDLt into TCL. At the logical level our 
translation coincides with the usual one for basic modal logic; our translation of 
programs, as expected, requires the TC operator to interpret the + of PDLT. 


Definition 10. For PDL* formulas A and programs a, we define the standard 
translations ST(A)(a) and ST(a)(2,y) as TCL-formulas with free variables x 
and x,y, resp., inductively as follows: 


ST(p)(x) = p(x) ST(a)(x,y) = a(z,y) 

ST(p)(x) = p(x) ST(aU b)(x,y) = ST(a)(z,y) V ST(8)(z,y) 
ST(AV B)(x) = ST(A)(x) V ST(B)(x) ST(a;B)(x,y) = 3z(ST(a)(x,z) AST(B)(z, y)) 
ST(AA B)(x) = ST(A)(x) A ST(B)(z) ST(at)(x,y) = TC(ST(a))(2, y) 

ST((a)A)(x) = 3y(ST(a)(x, y) A ST(A)(y)) 
ST([a]A)(x) = Yy(ST(a)(x, y) V ST(A)(y)) 


where TC(ST(q@)) is shorthand for TC (Ax, y.ST(a)(x,y)). 


It is routine to show that ST(A)(x) = ST(A)(x), by structural induction on 
A, justifying our overloading of the notation A, in both TCL and PDL*. Yet 
another advantage of using the same underlying language for both the modal and 
predicate settings is that we can state the following (expected) result without 
the need for encodings, following by a routine structural induction (see, e.g., [5]): 


Theorem 11. For PDLt formulas A, we have M,v 


= A iff M 


ST(A)(v). 
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3.3 Cohen-Rowe System is not Complete for PDL* 


PDL? admits a standard cut-free cyclic proof system LPD* (see Sect. 6.1) which 
is both sound and complete (cf. Theorem 30). However, a shortfall of TCg is that 
it is unable to cut-free simulate LPD*. In fact, we can say something stronger: 


Theorem 12 (Incompleteness). There exist a PDL* formula A such that 
H A but TCe Veye ST(A)(x) (in the absence of cut). 


This means not only that TCg is unable to locally cut-free simulate the rules 
of LPD, but also that there are some validities for which there are no cut-free 
cyclic proofs at all in TCg. One example of such a formula is: 


((aa U aba)*)p D (a* ((ba*)* U a))p (4) 


A detailed proof of this is found in [13], but let us briefly discuss it here. First, 
the formula above is not artificial: it is derived from the well-known PDL validity 
((a U b)*)p D (a*(ba*)*)p by identity-elimination. This in turn is essentially a 
theorem of relational algebra, namely (a U b)* < a*(ba*)*, which is often used 
to eliminate U in (sums of) regular expressions. The same equation was (one of 
those) used by Das and Pous in [14] to show that the sequent system LKA for 
Kleene Algebra is cut-free cyclic incomplete. 

The argument that TCg Veye ST(4) (£) is much more involved than the one 
from [14], due to the fact we are working in predicate logic, but the underlying 
basic idea is similar. At a very high level, the RHS of (4) (viewed as a relational 
inequality) is translated to an existential formula 3z(ST (a+) (x, z) AST((bat) TU 
a)(z,y) that, along some branch (namely the one that always chooses aa when 
decomposing the LHS of (4)) can never be instantiated while remaining valid. 
This branch witnesses the non-regularity of any proof. However ST(4)() is cycli- 
cally provable in TCg with cut, so an immediate consequence of Theorem 12 is: 


Corollary 13. The class of cyclic proofs of TCg does not enjoy cut- 
admissibility. 


4 Hypersequent Calculus for TCL 


Let us take a moment to examine why any ‘local’ simulation of LPD* by TCg 
fails, in order to motivate the main system that we shall present. The program 
rules, in particular the ()-rules, require a form of deep inference to be correctly 
simulated, over the standard translation. For instance, let us consider the action 
of the standard translation on two rules we shall see later in LPD* (cf. Sect. 6.1): 


T, (ao) p ST(I’)(c), da(ao(c, x) A p(x)) 


"e Eea Sith) antataelyatee ADE) 
T, (a) (bp N ST(T)(c), 3yla(c, y) A Ix(b(y, x) A p(x))) 
” T, (a;b)p ST(I’)(c), 3x(3y(a(c, y) A bly, x)) A p(x)) 
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init wk > o > U S, ar 5, {a}” fv(A) Nx=Ø 
{}7  S,S' o(8) ae MOS. 
S,{r}* S, {T, A, B}* S,{r, Ai}* 
id ——————_——_—_ A closed A v i € {0,1} 
S, {T, A}*, {A}” S{P,AAB} S, {T, Ao V A} 
inst SAN El Bay ee yy" : y fresh v SAU f fresh 
S, {r4 S, {P, 3z(A(z))}* S, {T, Vx(A(x))}¥* 


po SAT AUS OFS {2 Als, 2), TO(A)(2, 6) 7*7 
S, {T, TC(A)(s, t)}* 

ro EACS: 8), Als, FO) PS AD Als, t), TCAE) OF 
S, {1, TC(A)(s, t)}* 


z fresh 


f fresh 


Fig. 1. Hypersequent calculus HTC. ø is a ‘substitution’ map from constants to terms 
and a renaming of other function symbols and variables. 


The first case above suggests that any system to which the standard translation 
lifts must be able to reason underneath 4 and A, so that the inference indicated 
in blue is ‘accessible’ to the prover. The second case above suggests that the 
existential-conjunctive meta-structure necessitated by the first case should admit 
basic equivalences, in particular certain prenexing. This section is devoted to the 
incorporation of these ideas (and necessities) into a bona fide proof system. 


4.1 A System for Predicate Logic via Annotated Hypersequents 


An annotated cedent, or simply cedent, written S, S” etc., is an expression {I"}*, 
where I’ is a set of formulas and the annotation x is a set of variables. We 
sometimes construe annotations as lists rather than sets when it is convenient, 
e.g. when taking them as inputs to a function. 

Each cedent may be intuitively read as a TCL formula, under the following 
interpretation: fm({ T} 17277) := Aa,...da, AT. When x = Ø then there are 
no existential quantifiers above, and when I = Ø we simply identify A I’ with 
T. We also sometimes write simply A for the annotated cedent {A}?. 

A hypersequent, written S,S’ etc., is a set of annotated cedents. Each hyper- 
sequent may be intuitively read as the disjunction of its cedents. Namely we set: 


fm Dy, (In Pi) = mAy)... V fm Iny"). 


Definition 14 (System). The rules of HTC are given in Fig. 1. A HTC pre- 
proof is a (possibly infinite) derivation tree generated by the rules of HTC. A 
preproof is regular if it has only finitely many distinct subproofs. 


Our hypersequential system is somewhat more refined than usual sequent 
systems for predicate logic. E.g., the usual 3 rule is decomposed into J and inst, 
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whereas the usual A rule is decomposed into A and U. The rules for TC and TC 
are induced directly from their characterisations as fixed points in (1). 

Note that the rules TC and V introduce, bottom-up, the fresh function sym- 
bol f, which plays the role of the Herbrand function of the corresponding V 
quantifier: just as Vxdx2 A(x) is equisatisfiable with VxA(f(x)), when f is fresh, 
by Skolemisation, by duality 4xVxA(x) is equivalid with 4x A(f(x)), when f is 
fresh, by Herbrandisation. The usual V rule of the sequent calculus corresponds 
to the case when x = Ø. 


4.2 Non-wellfounded Hypersequent Proofs 


Our notion of ancestry, as compared to traditional sequent systems, must account 
for the richer structure of hypersequents: 


Definition 15 (Ancestry). Fiz an inference step r, as typeset in Fig. 1. A 
formula C in the premiss is an immediate ancestor of a formula C’ in the 
conclusion if they have the same colour; if C,C’ € I then we further require 
C=C", andifC,C’ occur in S then C = C" occur in the same cedent. A cedent 
S in the premiss is an immediate ancestor of a cedent S’ in the conclusion if 
some formula in S is an immediate ancestor of some formula in S. 


Immediate ancestry on both formulas and cedents is a binary relation, induc- 
ing a directed graph whose paths form the basis of our correctness condition: 


Definition 16 ((Hyper)traces). A hypertrace is a maximal path in the graph 
of immediate ancestry on cedents. A trace is a maximal path in the graph of 
immediate ancestry on formulas. 


Definition 17 (Progress and proofs). Fix a preproof D. A (infinite) trace 
(Fi)icw is progressing if there is k such that, for alli > k, F; has the form 
TC(A)(s;,t;) and is infinitely often principal. A (infinite) hypertrace H is pro- 
gressing if every infinite trace within it is progressing. A (infinite) branch is pro- 
gressing if it has a progressing hypertrace. D is a proof if every infinite branch 
is progressing. If, furthermore, D is regular, we call it a cyclic proof. 

We write HTC Fnuf S (or HTC Fey. S) if there is a proof (or cyclic proof, 
respectively) of HTC of the hypersequent S. 


In usual cyclic systems, checking that a regular preproof is progressing is 
decidable by straightforward reduction to the universality of nondeterministic 
w-automata, with runs ‘guessing’ a progressing trace along an infinite branch. 
Our notion of progress exhibits an extra quantifier alternation: we must guess an 
infinite hypertrace in which every trace is progressing. Nonetheless, by appealing 
to determinisation or alternation, we can still decide our progressing condition: 


Proposition 18. Checking whether a HTC preproof is a proof is decidable by 
reduction to universality of w-regular languages. 


3 In fact, by a simple well-foundedness argument, it is equivalent to say that (Fi)icw 
is progressing if it is infinitely often principal for a TC-formula. 
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As we mentioned earlier, cyclic proofs of HTC indeed are at least as expressive 
as those of Cohen and Rowe’s system by a routine local simulation of rules: 


Theorem 19 (Simulating Cohen-Rowe). If TCg F eye A then HTC Feye A 


4.3 Some Examples 


Example 20 (Fixed point identity). The sequent {T'C(a)(c, d)}”,{TC(a)(c, d)}” 
is finitely derivable using rule id on TC(a)(c,d) and the init rule. However we 
can also cyclically reduce it to a simpler instance of id. Due to the granularity of 
the inference rules of HTC, we actually have some liberty in how we implement 
such a derivation. E.g., the HTC-proof below applies TC rules below TC ones, 
and delays branching until the ‘end’ of proof search, which is impossible in TCg. 
The only infinite branch, looping on e, is progressing by the blue hypertrace. 
id i p7 TC - . 
i SM , Later e)? , {ale e)}? {TC (a) (e, d)}*, {TC (a) (e, a}? 
A {a(c, d)}? , {a(c, d)}? {a(c,e)}?, {TC(a)(e, d)}? , {a(c, e), TC(@)(e, d)}? 
na {2(0 d), Ole e)}”, {a(c, d), TC (a) (e, d)}”, {a(c, d)}”, {a(c, e), TC (a) (e, d)}* 
rg 120 2) ale, e)}®, {a(c, d), TC (a) (e, d)}? , {a(c, d)}”, {a(c, £), TC (a) (x, d)}* 
ro TCE d)}*, {a(c, d)}*, {a(c, £), TO (a) (z, d)}* s 
{TC (a)(c, d)}? , {TC (a) (e, d)}? 
This is an example of the more general ‘rule permutations’ available in HTC, 
hinting at a more flexible proof theory (we discuss this further in Sect. 8). 


init 


Example 21 (Transitivity). TC can be proved transitive by way of a cyclic proof 
in TCg of the sequent TC(a)(c, d), TC(a)(d, e), TC(a)(c, e). As in the previous 
example we may mimic that proof line by line, but we give a slightly different 
one that cannot directly be interpreted as a TCg proof: 


init 


0? Ex. 20 p7 


init 


"ae @),a(e,d) TO(a)(d,e), TO(a)(d, e) de AEA Woda Made Tacs’ 
os a(c, d), TC(a)(d, e), a(c, d), TC(a)(d, e) n a(c,c'), TC(a)(c',d), TC(a)(d, e), ā(c, e),āa(c, AT C(a)(c’,e) 
s a(c, d), TC(a)(d, e), {a(c, x), TO (a) (x, e)}* a(c,d'), TC (a)(c', d), TC(a)(d, e), a(c, e), {a(c, £), TC (a) (x, e) }* 


G d),a(c,c’), a(c,d), TC(a)(c’,d), TC (a) (d, e), a(c, e), {a(c, x), TC (a) (x, e)}* 
m TC(a)(c, d), TC(a)(d, e), a(c, e), {a(c, £), TO (a) (x, e)}* 5 
TC(a)(c, d), TC(a)(d, e), TC(@)(c, e) 


The only infinite branch (except for that from Example 20), looping on o, is 
progressing by the red hypertrace. 


Finally, it is pertinent to revisit the ‘counterexample’ (4) that witnessed 
incompleteness of TCg for PDL*. The following result is, in fact, already implied 
by our later completeness result, Theorem 28, but we shall present it nonetheless: 


Proposition 22. HTC eye ST((aa U aba)*)(c, d) D ST(at ((bat)* U a))(c, d). 
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Proof. We give the required cyclic proof in Fig.2, using the abbreviations: 
a(c, d) = ST(aaUaba)(c, d) and (c,d) = ST((bat)* Ua)(c, d). The only infinite 
branch (looping on e) has progressing hypertrace is marked in blue. 
Hypersequents R = {a(c, d)}*, {a(c, d), TC(@)(e, d)}*, {TC(a)(c,y), Bly, d)}¥ 
and R’ = {a(c,d)}?, {a(c,d)}”,{TC(a)(c, y), B(y,d)}¥ have finitary proofs, 
while P = {aba(c,e)}*,{TC(a)(e,d)}%,{TC(a)(c, y), Bly, d}¥ has a cyclic 
proof. 


, (TC@)(e,D}* {TCMG,2), Bud} 
m AF YF (TOME DV (ol. 9) TONG D BDE y 
UF 7 ATCO D7 {al B), TOK), BU. DP 
aF) TCE DY, {alf.v), BW. DY al, k), TCO, y), BO. DP 
J UE D TOE D TGD 
na CEDI HAE ATOM. dale A TOME BUD 
fale, NY? {af TCE d} {ale 2); TOCa)(2,¥), BUD F 
Ele NY? EU, d7 TC d) {ale y), BY. dy", al 2), TOE), Bly, DPF 
vv BODE ET AN TCO d} {O(a} ley) B(y, dy” 
P izale, e} (TCO e D {TCE y), BY. d}" 
n Talee), abalo e)}”  {TC(@)(e, d)”, {TC lale y), B D} 
R COO ACO OC 
RB (alc, 07 {alc d), TORE, DY {TC@w). BOD)" 
o HE) MEH}, (Aled), TOME. D {TEM Eu). BUD! 
{TC(@)(c,4)}*,{TC(a)(c,y), B; d} 


TAE E AE T A AA A A A EAA EAE AA 


{ST ((aa U aba)*)(c, d) V ST(a*; ((ba*)™ U a))(c, d)}% 


wk 


TC 


wk 


TC 


Fig. 2. Cyclic proof for sequent not cyclically provable by TCe. 


5 Soundness of HTC 


This section is devoted to the proof of the first of our main results: 


Theorem 23 (Soundness). If HTC Fnuf S then ES. 


The argument is quite technical due to the alternating nature of our progress 
condition. In particular the treatment of traces within hypertraces requires a 
more fine grained argument than usual, bespoke to our hypersequential structure. 

Throughout this section, we shall fix a HTC preproof D of a hypersequent S. 
For practical reasons we shall assume that D is substitution-free (at the cost of 
regularity) and that each quantifier in S binds a distinct variable. We further 
assume some structure M™* and an interpretation po such that po A S (within 
M*). Since each rule is locally sound, by contraposition we can continually 
choose ‘false premisses’ to construct an infinite ‘false branch’: 


Lemma 24 (Countermodel branch). There is a branch BX = (Si)icw of D 
and an interpretation p* such that, with respect to M* : 


4 Note that this convention means we can simply take y = x in the J rule in Fig. 1. 
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BY 


p% Æ Si, for alli < w; 

2. Suppose that S; concludes a TC step, as typeset in Fig. 1, and p* — 
TO(A)(s, t) [d/x]. If n is minimal such that p* = A(di, di41) for alli < n, 
p*(s) = do and p*(t) = dn, and n > 1, then p*(f)(d) = dıř so that 


pir = A(s, fœ))[d/x] and p* = TC(A)(f (x), t)[d/x). 


Unpacking this a little, our interpretation p% is actually defined as the limit of a 
chain of ‘partial’ interpretations (p;);<w, with each p; A S; (within MX). Note 
in particular that, by 2, whenever some T'C-formula is principal, we choose pj+1 
to always assign to it a falsifying path of minimal length (if one exists at all), 
with respect to the assignment to variables in its annotation. It is crucial at this 
point that our definition of p* is parametrised by such assignments. 

Let us now fix B* and p% as provided by the Lemma above. Moreover, let us 
henceforth assume that D is a proof, i.e. it is progressing, and fix a progressing 
hypertrace H = ({I;}*‘);<, along 6”. In order to carry out an infinite descent 
argument, we will need to define a particular trace along this hypertrace that 
‘preserves’ falsity, bottom-up. This is delicate since the truth values of formulas 
in a trace depend on the assignment of elements to variables in the annotations. 
A particular issue here is the instantiation rule inst, which requires us to ‘revise’ 
whatever assignment of y we may have defined until that point. Thankfully, our 
earlier convention on substitution-freeness and uniqueness of bound variables in 
D facilitates the convergence of this process to a canonical such assignment: 


Definition 25 (Assignment). We define 64: U x; > |M*| by n(x) = 
i<w 

p(t) if x is instantiated by t in H; otherwise y(x) is some arbitrary d € |M*|. 
Note that 6x is indeed well-defined, thanks to the convention that each quan- 

tifier in S binds a distinct variable. In particular we have that each variable x is 

instantiated at most once along a hypertrace. Henceforth we shall simply write 

p, ôn H A(x) instead of p H A(d71(x)). Working with such an assignment ensures 

that false formulas along H always have a false immediate ancestor: 


Lemma 26 (Falsity through H). If p*,ôn A F for some F € T;, then F 
has an immediate ancestor F' € Tj41 with p*, on A F”. 


In particular, regarding the inst rule of Fig. 1, note that if F € I'(y) then we 
can choose F” = F|t/y] which, by definition of 6,, has the same truth value. By 
repeatedly applying this Lemma we obtain: 


Proposition 27 (False trace). There exists an infinite trace T* = (Fi)icw 
through H such that, for alli, it holds that M* , p*, ôn |K F;. 


We are now ready to prove our main soundness result. 


Proof (of Theorem 23, sketch). Fix the infinite trace T% = (F;)i<cw through H 
obtained by Proposition 27. Since 7% is infinite, by definition of HTC proofs, it 


5 To be clear, we here choose an arbitrary such minimal ‘A-path’. 
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needs to be progressing, i.e., it is infinitely often TC-principal and there is some 
k € N s.t. for i > k we have that F; = TC(A)(s;,t;) for some terms s;, ti. 

To each F;, for i > k, we associate the natural number n; measuring the 
‘A-distance between s; and t;. Formally, n; € N is least such that there 
are do,...,dn, E |M™*| with p*(s) = do,p*(t) = dn, and, for all i < ni, 
p*, ôn H A(di, djs). Our aim is to show that (n;);s, has no minimal element, 
contradicting wellfoundness of N. For this, we establish the following two local 
properties: 


r r,A T,A T,B T, Ao T, A 
id wk ka A v v 
p,p T,A *°(aT,la]A T,A^B "Foya Py Ag¥ As 
DBA o DlA Taa Blja TAA 
' T, (a; B)A ° T, (ap Uan)yA "F (ap Uai)A I,[aUpjA 
mel [a] [GB] A Bs T,(a)A a Ty la)(at)A ak laJA TI, [a][at]A 
-P [a; BJA = Tylat\A ke T,lat\A I, [at]A 


Fig. 3. Rules of LPD*. 


1. (ni)i>sk is monotone decreasing, i.e., for all i > k, we have nj44 < ni; 
2. Whenever F; is principal, we have nj41 < ni. 


So (ni)i>k is monotone decreasing, by 1, but cannot converge, by 2 and the 
definition of progressing trace. Thus (n;),<; has no minimal element, yielding 
the required contradiction. 


6 HTC is Complete for PDLt, Over Standard Translation 


In this section we give our next main result: 
Theorem 28 (Completeness for PDL*). For a PDL*t formula A, if = A 
then HTC Feye ST(A)(c). 


The proof is by a direct simulation of a cut-free cyclic system for PDL* that is 
complete. We shall briefly sketch this system below. 


6.1 Circular System for PDLt 


The system LPD*, given in Fig. 3, is the natural extension of the usual sequent 
calculus for basic multimodal logic K by rules for programs. In Fig.3, (a)I' is 
shorthand for {(a)B : B € I}. (Regular) preproofs for this system are defined 
just like for HTC or TCg. The notion of ‘immediate ancestor’ is induced by the 
indicated colouring: a formula C in a premiss is an immediate ancestor of a 
formula C” in the conclusion if they have the same colour; if C,C’ € I’ then we 
furthermore require C = C’. 
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Definition 29 (Non-wellfounded proofs). Fix a preproof D of a sequent T. 
A thread is a maximal path in its graph of immediate ancestry. We say a thread 
is progressing if it has a smallest infinitely often principal formula of the form 
lat]A. D is a proof if every infinite branch has a progressing thread. If D is 
regular, we call it a cyclic proof and we may write LPD* Feye I 


Soundness of cyclic-LPD* is established by a standard infinite descent argu- 
ment, but is also implied by the soundness of cyclic-HTC (Theorem 23) and the 
simulation we are about to give (Theorem 28), though this is somewhat overkill. 
Completeness may be established by the game theoretic approach of Niwinski 
and Walukiewicz [23], as done by Lange [20] for PDL (with identity), or by purely 
proof theoretic techniques of Studer [25]. Either way, both results follow from 
a standard embedding of PDL* into the p-calculus and its known complete- 
ness results [23,25], by way of a standard ‘proof reflection’ argument: p-calculus 
proofs of the embedding are ‘just’ step-wise embeddings of LPD* proofs: 


Theorem 30 (Soundness and completeness, [20]). Let A be a PDL* for- 
mula. = A iff LPD* Foye A. 


6.2 A ‘Local’ Simulation of LPD* by HTC 


In this subsection we show that LPD*-preproofs can be stepwise transformed 
into HTC-proofs, with respect to the standard translation. In order to produce 
this local simulation, we need a more refined version of the standard translation 
that incorporates the structural elements of hypersequents. 

Fix a PDLt formula A = [a1]... [an] (G1)... (Gm) B, for n,m > 0. The hyper- 
sequent translation of A, written HT(A)(c), is defined as: 


{ST(a1)(c, di)}?, {ST (a2) (di, d2)}?,..., {ST (an) (dn—1, dn) }*, 

{ST(A1) (dns y1), ST (82) (Y2, Y3); +++) ST (Bm) (Ym—15 Ym); ST(B) (Ym) pre 
For I = Aj,..., Ax, we write HT(I’)(c) := HT(A1)(c),..., HT(Ag)(c). 
Definition 31 (HT-translation). Let D be a PDL* preproof. We shall define 
a HTC preproof HT(D)(c) of the hypersequent HT(A)(c) by a local translation 


of inference steps. We give only a few of the important cases here, but a full 
definition can be found in [13]. 


B 
— A step ka Ta is translated to: 


we HT re reves = (c), HT(A a 


) 
HT(B1)(d), ---, HT(Be)(@), HT(A)(d) 
“TT By@) =, (TBa )pesHT(A)(d) 
Gi yer, {CT (Br) (a) PP", {ST (a) (c, d)}%, HT(A) (d) 
{CT (Bi) (dP, {ST (a) (c, d), CU Br) (DP {ST (e, SIGIR al HT(A)(d) 
{ST(a)(c, y), CT(B1)(y ia KERES: ome )(c, y), CT 


in 


} 
) 
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where (omitted) left-premisses of U steps are simply proved by wk, id, init. In this 
and the following cases, we use the notation CT(A)(c) and x4 for the appropriate 
sets of formulas and variables forced by the definition of HT (again, see [13] for 
further details). 


- A (U); step (fori = 0,1), as typeset in Fig. 3, is translated to: 


HT (I")(e), HT ((aex) A) (c) 


AO HT((@o U nA 
- A (;) step, as typeset in Fig. 3, is translated to: 


PTH CH ear cl 

~ HT (L)(6), {ST (a) (6,2), ST(a)(2, 9), CT(A)g) FAH? 

“HTD, (ST(a(6,2) AST, y), TA WPA 
“ATO, BAST) 2) A ST y); TAMPA” 


HT(T) (c), HT (as; 8) A) (c) 


- A [+] step, as typeset in Fig. 3, is translated to: 


TEM TAO 
7 E HTT (c), {ST (a) (c, f)}*%, {TC(ST(a))(f, d)}* , HT(A)(d) 
P E HT(T)(¢), {ST (a) (c, f)}*, {ST (a) (c, d), TO(ST(a))(f, d)} 7, HT(A)(d) 
ro Te), {ST (a) (e, d), ST (a) (c, f)}* , {ST (a) (c, d), TC(ST(a))(f, d)}* , HT(A)(d) 
HTE) (e), {TC(ST(a))(c, d)}*, HT(A)(d) 


HT(I°)(c), HT([a*]A)(c) 


where E and E' derive HT(I’)(c) and HT([a]A)(c), resp., using wk-steps. 


Note that, formally speaking, the well-definedness of HT(D)(c) in the defi- 
nition above is guaranteed by coinduction: each rule of D is translated into a 
(nonempty) derivation. 


Remark 32 (Deeper inference). Observe that HTC can also simulate ‘deeper’ 
T, (a) (Bi) A 
T, (a) (6o U 1) A 
simulated too (similarly for []). E.g. (at)(b)p D (a*)(b U c)p admits a finite 
proof in HTC (under ST), rather than a necessarily infinite (but cyclic) one in 

LPD”. 


program rules than are available in LPD®. E.g. a rule may be 


6.3 Justifying Regularity and Progress 
Proposition 33. If D is regular, then so is HT(D)(c). 


Proof. Notice that each rule in D is translated to a finite derivation in HT(D)(c). 
Thus, if D has only finitely many distinct subproofs, then also HT(D)(c) has only 
finitely many distinct subproofs. 
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Proposition 34. If D is progressing, then so is HT(D)(c). 


Proof (sketch). We need to show that every infinite branch of HT(D)(c) has 
a progressing hypertrace. Since the HT translation is defined stepwise on the 
individual steps of D, we can associate to each infinite branch B of HT(D)(c) 
a unique infinite branch B’ of D. Since D is progressing, let r = (F;);<,, be a 
progressing thread along B’. By inspecting the rules of LPD* (and by defini- 
tion of progressing thread), for some k € N, each F; for i > k has the form: 
jaia] e [ain] lat]A, for some n; > 0. So, for i > k, HT(F;)(d;) has the form: 


{ST (a;,1)(c, dia) }7,---, {ST (Gn; )(din;—1) din, )} 7, { TC(ST (@)) (din, di) }®, HT (A) (ds) 


By inspection of the HT-translation (Definition 31) whenever Fj+1 is 
an immediate ancestor of F; in B’, there is a path from the cedent 
{TC(ST(a))(ditijnis1,di41)}” to the cedent {TC(ST(a))(din,,di)}” in the 
graph of immediate ancestry along B. Thus, since r = (F;i)icw is a 
trace along B’, we have a (infinite) hypertrace of the form H, = 
({A;, TC(ST(a))(din;, di) }”)is% along B. By construction A; = Ø for infinitely 
many i > k’, and so H, has just one infinite trace. Moreover, by inspection of 
the [+] step in Definition 31, this trace progresses in B every time r does in B’, 
and so progresses infinitely often. Thus, H is a progressing hypertrace. Since the 
choice of the branch B of D was arbitrary, we are done. 


6.4 Putting it all Together 
We can now finally conclude our main simulation theorem: 


Proof (of Theorem 28, sketch). Let A be a PDL formula s.t. K A. By the 
completeness result for LPD, Theorem 30, we have that LPD* Feye A, say by 
a cyclic proof D. From here we construct the HTC preproof HT(D)(c) which, by 
Propositions 33 and 34, is in fact a cyclic proof of HT(A)(c). Finally, we apply 
some basic V,A,3,V steps to obtain a cyclic HTC proof of ST(A)(c). 


7 Extension by Equality and Simulating Full PDL 


We now briefly explain how our main results are extended to the ‘reflexive’ 
version of TCL. The language of HTC allows further atomic formulas of the 
form s = t and s Æ t. The calculus HTC- extends HTC by the rules: 


8, {r}* S, {T (s), A(s)}* 
S, {t=t, r }* S, {I (s), s # t“, {A(t} }* 


The notion of immediate ancestry is colour-coded as in Definition 15, and 
the resulting notions of (pre)proof, (hyper)trace and progress are as in Def- 
inition 17. The simulation of Cohen and Rowe’s system TCg extends to 
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their reflexive system, RTCa, by defining their operator RTO (Ax, y.A)(s,t) = 
TC(Az,y.(x = y V A))(s,t). Note that, while it is semantically correct to set 
RTC(A)(s,t) to be s = tV TC(A)(s, t), this encoding does not lift to the Cohen- 
Rowe rules for RTC. Understanding that structures interpret = as true equality, 
a modular adaptation of the soundness argument for HTC, cf. Sect. 5, yields: 


Theorem 35 (Soundness of HTC=). If HTC= Frag S then ES. 


Turning to the modal setting, PDL may be defined as the extension of PDLt 
by including a program A? for each formula A. Semantically, we have (A?)™ = 
{(v, v) : M, v — A}. From here we may define £ := T? and a* := (eUa)t; again, 
while it is semantically correct to set a* = € U a, this encoding does not lift 
to the standard sequent rules for x. The system LPD is obtained from LPD* by 
including the rules: 


a TA P TAB 
T,(A?)B F [AB 


Again, the notion of immediate ancestry is colour-coded as for LPD*; the result- 
ing notions of (pre)proof, thread and progress are as in Definition 29. Just like 
for LPD*, a standard encoding of LPD into the p-calculus yields its soundness 
and completeness, thanks to known sequent systems for the latter, cf. [23,25], 
but has also been established independently [20]. Again, a modular adaptation 
of the simulation of LPD* by HTC, cf. Sect. 6, yields: 


Theorem 36 (Completeness for PDL). Let A be a PDL formula. If = A 
then HTC= Feye ST(A)(c). 


8 Conclusions 


In this work we proposed a novel cyclic system HTC for Transitive Closure 
Logic (TCL) based on a form of hypersequents. We showed a soundness theorem 
for standard semantics, requiring an argument bespoke to our hypersequents. 
Our system is cut-free, rendering it suitable for automated reasoning via proof 
search. We showcased its expressivity by demonstrating completeness for PDL, 
over the standard translation. In particular, we demonstrated formally that such 
expressivity is not available in the previously proposed system TCg of Cohen and 
Rowe (Theorem 12). Our system HTC locally simulates TCg too (Theorem 19). 

As far as we know, HTC is the first cyclic system employing a form of deep 
inference resembling alternation in automata theory, e.g. wrt. proof checking, 
cf. Proposition 18. It would be interesting to investigate the structural proof the- 
ory that emerges from our notion of hypersequent. As hinted at in Examples 20 
and 21, our hypersequential system exhibits more liberal rule permutations than 
usual sequents, so we expect their focussing and cut-elimination behaviours to 
similarly be richer, cf. [21,22]. Note however that such investigations are rather 
pertinent for pure predicate logic (without TC): focussing and cut-elimination 
arguments do not typically preserve regularity of non-wellfounded proofs, cf. [2]. 
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Finally, our work bridges the cyclic proof theories of (identity-free) PDL and 
(reflexive) TCL. With increasing interest in both modal and predicate cyclic 
proof theory, it would be interesting to further develop such correspondences. 


Acknowledgements. The authors would like to thank Sonia Marin, Jan Rooduijn 
and Reuben Rowe for helpful discussions on matters surrounding this work. 
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Abstract. Equational unification and matching are fundamental mech- 
anisms in many automated deduction applications. Supporting them effi- 
ciently for as wide as possible a class of equational theories, and in a 
typed manner supporting type hierarchies, benefits many applications; 
but this is both challenging and nontrivial. We present Maude 3.2’s effi- 
cient support of these features as well as of symbolic reachability analysis 
of infinite-state concurrent systems based on them. 


1 Introduction 


Unification is a key mechanism in resolution [41] and paramodulation-based 
[36] theorem proving. Since Plotkin’s work [40] on equational unification, i.e., 
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E-unification modulo an equational theory E, it is widely used for increased 
effectiveness. Since Walther’s work [47] it has been well understood that typed 
E-unification, exploiting types and subtype hierarchies, can drastically reduce a 
prover’s search space. Many other automated deduction applications use typed 
E-unification as a key mechanism, including, inter alia: (i) constraint logic pro- 
gramming, e.g., [12,23]; (ii) narrowing-based infinite-state reachability analysis 
and model checking, e.g., [6,35]; (iii) cryptographic protocol analysis modulo 
algebraic properties, e.g., [8, 19,28]; (iv) partial evaluation, e.g., [4,5]; and (v) 
SMT solving, e.g., [32,48]. The special case of typed E-matching is also a key 
component in all the above areas as well as in: (vi) E-generalization (also called 
anti-unification), e.g., [1,2]; and (vii) E-homeomorphic embedding, e.g., [3]. 
Maximizing the scope and effectiveness of typed E-unification and E- 
matching means efficiently supporting as wide a class of theories E as possible. 
Such efficiency crucially depends on both efficient algorithms (and their com- 
binations) and —since the number of H-unifiers may be large— on computing 
complete minimal sets of solutions to reduce the search space. The recent Maude 
3.2 release! provides this kind of efficient support for typed H-unification and 
E-matching in three, increasingly more general classes of theories E: 


1. Typed B-unification and B-matching, where B is any combination of asso- 
ciativity (A) and/or commutativity (C) and/or unit element (U) axioms. 

2. Typed E U B-unification and matching in the user-definable infinite class of 
theories EU B with B as in (1), and EU B having the finite variant property 
(FVP) [13,21]. 

3. Typed EU B-unification for the infinite class of user-definable theories EU B 
with B as in (1), and E confluent, terminating, and coherent modulo B. 


For classes (1) and (2) the set of B- (resp. E U B-) unifiers is always complete, 
minimal and finite, except for the AwoC case when B contains an A but not C 
axiom for some binary symbol f.? The typing is order-sorted [22,29] and thus 
contains many-sorted and unsorted B- (resp. EU B-) unification as special cases. 
For class (3), Maude enumerates a possibly infinite complete set of EU B-unifiers, 
with the same AwoC exception on B. We discuss new features for classes (1)—(2), 
and a new narrowing modulo E U B-based symbolic reachability analysis feature 
for infinite-state systems specified in Maude as rewrite theories (X, E U B, R) 
with equations ÆU B in class (2) and concurrent transition rules R. In Sect. 5 
we discuss various applications that can benefit from these new features. 

In comparison with previous Maude tool papers reporting on new features 
—the last one was [16]— the new features reported here include: (i) computing 
minimal complete sets of most general B- (resp. Æ U B-) unifiers for classes (1) 
and (2) except for the AwoC case; (ii) a new E U B-matching algorithm for 
class (2); and (iii) a new symbolic reachability analysis for concurrent systems 


' Publicly available at http://maude.cs.illinois.edu. 

2 In the AwoC case, Maude’s algorithms are optimized to favor many commonly occur- 
ring cases where typed A-unification is finitary, and provides a finite set of solutions 
and an incompleteness warning outside such cases (see [18]). 
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based on narrowing with transition rules modulo equations Æ U B in class (2) 
enjoying powerful state-space reduction capabilities based on the minimality and 
completeness feature (i) and on “folding” less general symbolic states into more 
general ones through subsumption. Section 3.1 shows the importance of the new 
E U B-matching algorithm for efficient computation of minimal E U B-unifiers. 


Notation, Strict-B-Coherence, and FVP. For notation involving either term 

positions, p € pos(t), t|p, t[t’],, or substitutions, t0, Ou, see [14]. Equations 
— 

(u = v) € E oriented as rules (u > v) € E are strictly coherent modulo axioms 


B iff (t =B t^t Fp w) => du’ (t Fes w’ Aw =p wu’), where t Bp U 


iff J(u > v) € E, 30, 3p € pos(t)(u0 =g t|, Aw = t[v6],). For (X, E U B) an 
=> 
equational theory with E confluent, terminating and strictly coherent modulo 


B, (1) an E, B-t-variant is a pair (v,@) s.t. v = (tO)!lz p ^0 = Ole p where 


) denotes the E, B-normal form of u, resp. 8; (2) for E, B-t- 


ul p (resp. Iz p 
variants (v, 0), (u, y), the more general relation (v, 0) Ig (u, u) holds iff Sy(u =p 
vy A Oy =B u); (3) (X, E U B) is FVP [13,21] iff any Y-term t has a finite set 


of most general E, B-t-variants. Footnote 5 explains how FVP can be checked. 


2 Complete and Minimal Order-Sorted B-Unifiers 


Throughout the paper we use the following equational theory Æ U B of the 
Booleans as a running example (with self-explanatory, user-definable syntax®): 


fmod BOOL-FVP is protecting TRUTH-VALUE . 
op _and_ : Bool Bool -> Bool [assoc comm] 
op _xor_ : Bool Bool -> Bool [assoc comm] 
op not_ : Bool -> Bool . 
op _or_ : Bool Bool -> Bool . 
op _<=>_ : Bool Bool -> Bool . 
vars X Y Z W : Bool . 


eq X and true = X [variant] 

eq X and false = false [variant] 

eq X and X = X [variant] 

eq X and X and Y = X and Y [variant] . xxx AC extension 
eq X xor false = X [variant] 

eq X xor X = false [variant] 

eq X xor X xor Y = Y [variant] . *xkk AC extension 


eq not X = X xor true [variant] 

eq X or Y = (X and Y) xor X xor Y [variant] 

eq X <=> Y = true xor X xor Y [variant] 
endfm 


3 This module imports Maude’s TRUTH-VALUE module and the command “set include 
BOOL off .” must be typed before the module to avoid default importation of BOOL. 


532 F. Duran et al. 


The axioms B are the associativity-commutativity (AC) axioms for xor and and 
(specified with the assoc comm attributes). The equations E are terminating and 
confluent modulo B [42]. To achieve strict B-coherence [30], the needed AC- 
extensions [39] are added —for example, the AC-extension of X xor X = false 
is X xor X xor Y = Y. The equations E for xor and and define the theory of 
Boolean rings, except for the missing* distributivity equation X and (Y xor Z) 
= (X and Y) xor (X and Z). The remaining equations in E define or, not and 
<=> as definitional extensions. The variant attribute declares that the equation 
will be used for folding variant narrowing [21]. The theory is FVP,° in class (2). 
In this section we will consider B-unification (for B = AC) using this example. 
E U B-unification for the same example will be discussed in Sect. 3. 

For B any combination of associativity and/or commutativity and/or iden- 
tity axioms, Maude’s unify command computes a complete finite set of most 
general B-unifiers, except for the AwoC case. The new irredundant unify com- 
mand always returns? a finite, complete and minimal set of B-unifiers, except 
for the AwoC case. The output of unify for the equation below can be found 
in [10, §13]. 


Maude> irredundant unify X and not Y and not Z =? W and Y and not X . 
Decision time: Oms cpu (Oms real) 


Unifier 1 Unifier 2 

X --> #1:Bool and #2:Bool X --> #2:Bool 

Z --> #1:Bool and #2:Bool Z --> #1:Bool 

Y --> #1:Bool Y --> #2:Bool 

W --> #2:Bool and not #1:Bool W --> not #1:Bool 


3 EU B-Unification and Matching for FVP Theories 


It is a general result from [21] that if HUB is FVP and B-unification is finitary, 
then E U B-unification is finitary and a complete finite set of Æ U B-unifiers 
can be computed by folding variant narrowing [21]. Furthermore, assuming that 
Ts/Eg,s is non-empty for each sort s, a finitary E U B-unification algorithm 
automatically provides a decision procedure for satisfiability of any positive (the 
A, V-fragment) quantifier-free formula ¢ in the initial algebra T/g, since p can 
be put in DNF, and a conjunction of equalities I’ is satisfiable in Ty/p iff I’ is 
E U B-unifiable. 

Since for our running example BOOL-FVP the equations EU B are FVP and B- 
unification (in this case B = AC) is finitary, all this has useful consequences for 


4 By missing distributivity, this theory is weaker than the theory of Boolean rings. 
Nevertheless, its initial algebra Ts/pug is exactly the Booleans on {true, false} 
with the standard truth tables for all connectives. Thus, all equations provable in 
Boolean algebra hold in Ts/gusg, including the missing distributivity equation. 

5 This can be easily checked in Maude by checking the finiteness of the variants for 
each f(X), resp. f(X,Y), for each unary, resp. binary, symbol f in BOOL-FVP using 
the get variants command; see [9] for a theoretical justification of this check. 

6 Fresh variables follow the form #1:Bool. 
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BOOL-FVP. Indeed, Ts/pupg is exactly the Booleans’ on {true,false} with the 
well-known truth tables for and, xor, not, or and <=>. This means that E U B- 
unification provides a Boolean satisfiability decision procedure for a Boolean 
expression u on such symbols, namely, u is Boolean satisfiable iff the equation 
u = true is EU B-unifiable. Furthermore, a ground assignment p to the variables 
of u is a satisfying assignment for u iff there exists an ÆU B-unifier a of u = true 
and a ground substitution ô such that p = ad. For the same reasons, u is a 
Boolean tautology iff the equation u = false has no E U B-unifiers. 

A complete, finite set of Æ U B-unifiers can be computed with Maude’s 
variant unify command whenever ÆU B is FVP, except for the AwoC case. 
Instead, the new filtered variant unify command computes a finite, com- 
plete and minimal set of E U B-unifiers, which can be considerably smaller 
than that computed by variant unify. For our BOOL-FVP example, filtered 
variant unify gives us a Boolean satisfiability decision procedure plus a sym- 
bolic specification of satisfying assignments. Such a procedure is not practical: it 
cannot compete with standard SAT-solvers; but that was never our purpose: our 
purpose here is to illustrate with simple examples how FE U B-unification works 
for the infinite class of user-definable FVP theories E U B, of which BOOL-FVP 
is just a simple example; dozens of other examples can be found in [32]. 

The difference between the variant unify and the new filtered variant 
unify command is illustrated with the following example; its unfiltered output 
can be found in [10, §14]. Note that the single E U B-unifier gives us a compact 
symbolic description of this Boolean expression’s satisfying assignments. 


Maude> filtered variant unify (X or Y) <=> Z =? true . 
rewrites: 3224 in 12765ms cpu (14776ms real) (252 rewrites/second) 


Unifier 1 

X --> #1:Bool xor #2:Bool 

Y --> #1:Bool 

Z --> #2:Bool xor (#1:Bool and (#1:Bool xor #2:Bool)) 


No more unifiers. 
Advisory: Filtering was complete. 


The computation of a minimal set of E U B-unifiers relies on filtering by E U B- 
matching between two E U B-unifiers, as explained in the following section. 


3.1 FVP EU B-Matching and Minimality of E U B-Unifiers 


By definition, a term u E U B-matches another term v iff there is a substitution 
y such that u =gup vy. Besides the existing match command modulo axioms 


T Each connective’s truth table can be checked with Maude’s reduce command. Actu- 
ally, need only check and and xor (other connectives are definitional extensions). 

8 In Maude, different command names are used to emphasize different algorithms. 
The word ‘filtered’ is used instead of ‘irredundant’ because irredundancy is not 
guaranteed in the AwoC case. 
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B, Maude’s new variant match command computes a complete, minimal set 
of E U B-matching substitutions for any FVP theory EU B in class (2), except 
for the AwoC case. Such an algorithm could always be derived from an E U B- 
unification algorithm by replacing u by ū, where all variables in u are replaced 
by fresh constants in @, and computing the E U B-unifiers of ŭu = v. But a more 
efficient special-purpose algorithm has been designed and implemented for this 
purpose. E U B-matching algorithms are automatically provided by Maude for 
any user-definable theory in class (2) with the variant match command. 


Maude> variant match in BOOL-FVP : Z and W <=? X . 
rewrites: 12 in 21ms cpu (27ms real) (545 rewrites/second) 


Matcher 1 Matcher 2 Matcher 3 
Z --> true Z--> X Z--> X 
W--> X W --> true W--> X 


This is a good moment to ask and answer a relevant question: Why is com- 
puting a complete minimal set of E U B-unifiers for a unification problem I’, 
where E U B is an FVP theory in class (2) except for the AwoC case, non- 
trivial? We first need to explain how minimality is achieved. Suppose that a 
and 8 are two E U B-unifiers of a system of equations I’ with, say, typed vari- 
ables £1,...,£n. We then say that œa is more general than B modulo E U B, 
denoted a J gug 2, iff there is a substitution y such that for each z; 1 <i<n, 
y(a(vi)) =zuB lxi). But this exactly means that the vector [3(21),...,G(an)] 
E U B-matches the vector [a(x1),...,a(@)] with Æ U B-matching substitution 
y. A complete set of E U B-unifiers of I’ is by definition minimal iff for any two 
different unifiers a and 8 in it we have a Agus B and B Agus a, i.e., the two 
associated E U B-matching problems fail. 

What is nontrivial is computing a minimal complete set of Æ U B-unifiers 
efficiently. One could do so inefficiently by simulating E U B-matching with FU 
B-unification, and more efficiently by using an EU B-matching algorithm. Maude 
achieves still greater efficiency by directly computing the a Igu ps ĝ relation. 
The key difference between the variant unify command and the new filtered 
variant unify command is that the second computes a E U B-minimal set of 
EU B-unifiers of I’ using the a J gupg £ relation, whereas the first only computes 
a set of B-minimal E U B-unifiers of I using the cheaper a Jz ĝ relation. There 
are three ideas we use to make it fast in practice: (i) variant matching is faster 
than variant unification because one side is variable-free; (ii) enumerating the 
variant matchers between two variant unifiers is far more expensive than checking 
existence of a matcher; and (iii) variant unifiers are discarded on-the-fly avoiding 
further narrowing steps and computation. 


4 Narrowing-Based Symbolic Reachability Analysis 


In Maude, concurrent systems are specified in so-called system modules as rewrite 
theories of the form: R = (X, G, R), where G is an equational theory either of the 
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form B in class (1), or EUB in classes (2) or (3), and R are the system transition 
rules, specified as rewrite rules. When the theory R is topmost, meaning that the 
rules R rewrite the entire state, narrowing with rules R modulo the equations 
G is a complete symbolic reachability analysis method for infinite-state systems 
[35]. That is, given a term u with variables 7, representing a typically infinite 
set of initial states, and another term v with variables y , representing a possibly 
infinite set of target states, narrowing can answer the question: can an instance 
of u reach an instance of v? That is, does the formula IF, Y u —* v hold in 
R? Note that, if the complement of a system invariant J can be symbolically 
described as the set of ground instances of terms in a set {v1,...,Un} of pattern 
terms, then narrowing provides a semi-decision procedure for verifying whether 
the system specified by R fails to satisfy J starting from an initial set of states 
specified by u. Namely, J holds iff no instance of any v; can be reached from 
some instance of u. 

Assuming G is in class (1) or (2), Maude’s vu-narrow command implements 
narrowing with R modulo G by performing G-unification at each narrowing 
step. However, the number of symbolic states that need to be explored can be 
infinite. This means that if no solution exists for the narrowing search, Maude 
will search forever, so that only depth-bounded searches will terminate. The great 
advantage of the new {fold} vu-narrow {filter,delay} command is that it 
performs a powerful symbolic state space reduction by: (i) removing a newly 
explored symbolic state v’ if it EU B-matches a previously explored state v and 
replacing transition with target v’ by transitions with target v; and (ii) using 
minimal sets of E U B-unifiers for each narrowing step and for checking common 
instances between a newly explored state and the target term (ensured by words 
filter and delay). This can make the entire search space finite and allow full 
verification of invariants for some infinite-state systems. Consider the following 
Maude specification of Lamport’s bakery protocol. 


mod BAKERY is 
sorts Nat LNat Nat? State WProcs Procs . 
subsorts Nat LNat < Nat? . subsort WProcs < Procs . 
op 0: -> Nat . 
op s : Nat -> Nat . 
op [_] : Nat -> LNat . *** number-locking operator 
op < wait,_> : Nat -> WProcs . 
op < crit,_> : Nat -> Procs . 


op mt : -> WProcs . xxx empty multiset 

op __ : Procs Procs -> Procs [assoc comm id: mt] . **x union 
op __ : WProcs WProcs -> WProcs [assoc comm id: mt] . *** union 
op _I_I_ : Nat Nat? Procs -> State . 


vars nm i j k : Nat . var x? : Nat? . var PS : Procs . var WPS : WProcs . 


rl [mew]: m | n | PS => s(m) | n | < wait,m > PS [narrowing] . 
rl [enter]: m | < wait,n > PS => m | [n] | < crit,n > PS [narrowing] . 
rl [leave]: m n] | < crit,n > PS => m | s(n) | PS [narrowing] . 

endm 
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The states of BAKERY have the form “m | x? | PS” with m the ticket-dispensing 
counter, x? the (possibly locked) counter to access the critical section, and PS a 
multiset of processes either waiting or in the critical section. BAKERY is infinite- 
state: [new] creates new processes, and the counters can grow unboundedly. 
When a waiting process enters the critical section with [enter], the second 
counter n is locked as [n]; and it is unlocked and incremented when it leaves 
it with [leave]. The key invariant is mutual exclusion. Note that the term 
“i | x? | < crit, j > < crit, k > PS” describes all states in the comple- 
ment of mutual exclusion states. Without the fold option, narrowing does not 
terminate, but with the following command we can verify that BAKERY satisfies 
mutual exclusion, not just for the initial state “O | O | mt”, but for the much 
more general infinite set of initial states with waiting processes only “m | n | 
WPS”. 


Maude> {fold} vu-narrow {filter,delay} 
m | n | WPS =>* i | x? | < crit, j > < crit, k > PS. 


No solution. 
rewrites: 4 in 1ms cpu (1ms real) (2677 rewrites/second) 


The new vu-narrow {filter,delay} command can achieve dramatic state 
space reductions over the previous vu-narrow command by filtering E U B- 
unifiers. This is illustrated by a simple cryptographic protocol example in [10, 
§15] exploiting the unitary nature of unification in the exclusive-or theory [24]. 


5 Applications and Conclusion 


Maude can be used as a meta-tool to develop new formal tools because: (i) its 
underlying equational and rewriting logics are logical —and reflective meta- 
logical— frameworks [7,27,46]; (ii) Maude’s efficient support of logical reflection 
through its META-LEVEL module; (iii) Maude’s rewriting, search, model checking, 
and strategy language features [11,15]; and (iv) Maude’s symbolic reasoning 
features [15,33], the latest reported here. We refer to [11,15,31,33] for references 
on various Maude-based tools. Many of them can benefit from these new features. 

By way of example we mention some areas ready to reap such benefits: (1) 
Formal Analysis of Cryptographic Protocols. The new features can yield substan- 
tial improvements to tools such as Maude-NPA [19], Tamarin [28] and AKISS [8]. 
(2) Model Checking of Infinite-State Systems. The narrowing-based LTL sym- 
bolic model checker reported in [6,20], and the addition of new symbolic capa- 
bilities to Real-Time Maude [37,38] can both benefit from the new features. (3) 
SMT Solving. In Sect. 3 we noted that FVP EU B-unification makes satisfiability 
of positive QF formulas in Ts/gug decidable. Under mild conditions, this has 
been extended in [32,44] to a procedure for satisfiability in Ts/gug of all QF 
formulas which will also benefit from the new features. (4) Theorem Proving. 
The new Maude Inductive Theorem Prover under construction [34], as well as 
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Maude’s Invariant Analyzer [43] and Reachability Logic Theorem Prover [45] all 
use equational unification and narrowing modulo equations; so all will benefit 
from the new features. (5) Theory Transformations based on equational unifi- 
cation, e.g., partial evaluation [4], ground confluence methods [17] or program 
termination methods [25,26] could likewise become more efficient. 

In conclusion, we have presented and illustrated with examples new equa- 
tional unification and matching, and symbolic reachability analysis features in 
Maude 3.2. Thanks to the above-mentioned properties (i)-(iv) of Maude as a 
meta-tool, we hope that this work will encourage other researchers to use Maude 
and its symbolic features to develop new tools in many different logics. 
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Abstract. The ontology of Leśniewski is commonly regarded as the 
most comprehensive calculus of names and the theoretical basis of mere- 
ology. However, ontology was not examined by means of proof-theoretic 
methods so far. In the paper we provide a characterization of elementary 
ontology as a sequent calculus satisfying desiderata usually formulated 
for rules in well-behaved systems in modern structural proof theory. In 
particular, the cut elimination theorem is proved and the version of sub- 
formula property holds for the cut-free version. 
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1 Introduction 


The ontology of Lesniewski is a kind of calculus of names proposed as a formal- 
ization of logic alternative to Fregean paradigm. Basically, it is a theory of the 
binary predicate £ understood as the formalization of the Greek ‘esti’. Informally 
a formula acb is to be read as “(the) a is (a/the) b”, so in order to be true a 
must be an individual name whereas b can be individual or general name. In the 
original formulation Lesniewski’s ontology is the middle part of the hierarchical 
structure involving also the protothetics and mereology (see the presentation in 
Urbaniak [20]). Protothetics, a very general form of propositional logic, is the 
basis of the overall construction. Its generality follows from the fact that, in addi- 
tion to sentence variables, arbitrary sentence-functors (connectives) are allowed 
as variables, and quantifiers binding all these kinds of variables are involved. 
Similarly in Lesniewski’s ontology, we have a quantification over name variables 
but also over arbitrary name-functors creating complex names. In consequence 
we obtain very expressive logic which is then extended to mereology. The latter, 
which is the most well-known ingredient of Lesniewski’s construction, is a theory 
of parthood relation, which provides an alternative formalization of the theory 
of classes and foundations of mathematics. 

Despite of the dependence of LeSniewski’s ontology on his protothetics, we 
can examine this theory, in particular its part called elementary ontology, in 
isolation, as a kind of first-order theory of € based on classical first-order logic 
(FOL). Elementary ontology, in this sense, was investigated, among others, by 
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Stupecki [17] and Iwanus [7], and we follow this line here. The expressive power 
of such an approach is strongly reduced, in particular, quantifiers apply only to 
name variables. One should note however that, despite of the appearances, it 
is not just another elementary theory in the standard sense, since the range of 
variables is not limited to individual names but admits general and even empty 
names. Thus, name variables may represent not only ‘Napoleon Bonaparte’ but 
also ‘an emperor’ and ‘Pegasus’. This leads to several problems concerning the 
interpretation of quantifiers in ontology, encountered in the semantical treat- 
ment (see e.g. Kiing and Canty [8] or Rickey [16]). However, for us the problems 
of proper interpretation are not important here, since we develop purely syn- 
tactical formulation, which is shown to be equivalent to Lesniewski’s axiomatic 
formulation. 

Taking into account the importance and originality of Lesniewski’s ontol- 
ogy it is interesting, if not surprising, that so far no proof-theoretic study was 
offered, in particular, in terms of sequent calculus (SC). In fact, a form of natu- 
ral deduction proof system was applied by many authors following the original 
way of presenting proofs by Leśniewski (see, e.g. his [9-11]). However this can 
hardly be treated as a proof-theoretic study of Lesniewski’s ontology but only 
as a convenient way of simplifying presentation of axiomatic proofs. Ishimoto 
and Kobayashi [6] introduced also a tableau system for part of (quantifier-free) 
ontology — we will say more about this system later. 

In this paper we present a sequent calculus for elementary ontology and focus 
on its most important properties. More specifically, in Sect.2 we briefly charac- 
terise elementary ontology which will be the object of our study. In Sect.3 we 
present an adequate sequent calculus for the basic part of elementary ontology 
and prove that it is equivalent with the axiomatic formulation. Then we prove 
the cut elimination theorem for this calculus in Sect. 4. In the next section we 
focus on the problem of extensionality and discuss some alternative formula- 
tions of ontology and some of its parts, as well as the intuitionistic version of it. 
Section 6 shows how the basic system can be extended with rules for new pred- 
icate constants which preserve cut elimination. The problem of extension with 
rules for term constants is discussed briefly in Sect.7. A summary of obtained 
results and open problems closes the paper. 


2 Elementary Ontology 


Roughly, in this article, by LeSniewski’s elementary ontology we mean stan- 
dard FOL (in some chosen adequate formalization) with Lesniewski’s axiom 
LA added. For more detailed general presentation of Lesniewski’s systems one 
may consult Urbaniak [20] and for a detailed study of Legniewski’s ontology 
see Iwanus [7] or Stupecki [17]. In the next section we will select a particular 
sequent system as representing FOL and investigate several ways of possible 
representation of LA in this framework. 

We will consider two languages for ontology. In both we assume a denumer- 
able set of name variables. Following the well-known Gentzen’s custom we apply 
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a graphical distinction between the bound variables, which will be denoted by 
L,Y, Z, ... (possibly with subscripts), and the free variables usually called param- 
eters, which will be denoted by a, b, c, .... These are the only terms we admit, and 
both kinds will be called simply name variables. The basic language Lo consists 
of the following vocabulary: 


— connectives: =, A, V, >; 
— first-order quantifiers: V, 3; 
— predicate: €. 


As we can see, in addition to the standard logical vocabulary of FOL, the 
only specific constant is a binary predicate € with the formation rule: tet’ is 
an atomic formula, for any terms t, t’. In what follows we will use a convention: 
instead of tet’ we will write tt’. The complexity of formulae of Lo is defined as 
the number of occurrences of logical constants, i.e. connectives and quantifiers. 
Hence the complexity of atomic formulae is 0. 

The language Lp, considered in Sect.6, adds to this vocabulary a number of 
unary and binary predicates: D, V, S,G,U,=,=, ~,ē, C, É, A, E, I, O. 

In Lo and Lp we have name variables, which range over all names (individ- 
ual, general and empty), as the only terms. However Leśniewski considered also 
complex terms built with the help of specific term-forming functors. We will 
discuss briefly such extensions in the setting of sequent calculus in Sect. 7 and 
notice important problems they generate for decent proof-theoretic treatment. 

The only specific axiom of elementary ontology is Lesniewski’s axiom LA: 


Vey(xy > Iz(zx) AV2(za > zy) A Yzv(zz A vz = zv)) 


LA”, LA“ will be used to refer to the respective implications forming LA, 
with dropped outer universal quantifier. Note that: 


Lemma 1. The following formulae are equivalent to LA: 


1. Vay(ay @ dz(za A zy) A Yzv(zx A va = zv)) 
2. Vey(ay > Iz(zx A zy AVo(vx —> vz))) 
3. Vay(ay > dz(Vo(ve > vz) A zy)) 


We start with the system in the language Lo, i.e. with £ (conventionally 
omitted) as the only specific predicate constant added to the standard language 
of FOL. 


3 Sequent Calculus 


Elementary ontology will be formalised as a sequent calculus with sequents I > 
A which are ordered pairs of finite multisets of formulae called the antecedent 
and the succedent, respectively. We will use the calculus G (after Gentzen) which 
is essentially the calculus G1 of Troelstra and Schwichtenberg [19]. All necessary 
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Jzy, l> A = PSA dro 


where a is a fresh parameter (eigenvariable), not present in T, A and y, whereas b is 
an arbitrary parameter. 


Fig. 1. Calculus G 


structural rules, including cut, weakening and contraction are primitive. The 
calculus G consists of the rules from Fig. 1: 

Let us recall that formulae displayed in the schemata are active, whereas 
the remaining ones are parametric, or form a context. In particular, all active 
formulae in the premisses are called side formulae, and the one in the conclusion 
is the principal formula of the respective rule application. Proofs are defined in 
a standard way as finite trees with nodes labelled by sequents. The height of a 
proof D of I’ = A is defined as the number of nodes of the longest branch in D. 
Fk I = A means that l = A has a proof of the height at most k. 

G provides an adequate formalization of the classical pure FOL (i.e. with no 
terms other than variables). However, we should remember that here terms in 
quantifier rules are restricted to variables ranging over arbitrary names (includ- 
ing empty and general). This means, in particular, that quantifiers do not have 
an existential import, like in standard FOL. 
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Let us call G+LA an extension of G with LA as an additional axiomatic 
sequent. The following hold: 


Lemma 2. The following sequents are provable in G+LA: 
ab => Ax(xa) 
ab = Va(xa — xb) 
ab => Vay(xa A ya > zy) 

Jx(xa), Yz(xa > xb), Vry(xa ^A ya > xy) = ab 


The proof is obvious. In fact, these sequents together allow us to derive LA 
so we could use them alternatively in a characterization of elementary ontology 
on the basis of G. 

G+LA is certainly an adequate formalization of elementary ontology in the 
sense of Stupecki and Iwanus. However, from the standpoint of proof theoretic 
analysis it is not an interesting form of sequent calculus and it will be used only 
for showing the adequacy of our main system called GO. 

To obtain the basic GO we add the following four rules to G: 


aa, l’ => A ac, >A ba, => A 
(2) ab, => A 2 ab, bc, l = A >) ab, bb, > A 
(E) da, l => A,dc dc, l => A,da ab, => A 
cb, l => A 


where d in (E) is a new parameter (eigenvariable), and a,b,c are arbitrary. 
The names of rules come from reflexivity, transitivity, symmetry and exten- 
sionality. In case of (R) and (S) it is a kind of prefixed reflexivity and symmetry 
(ab > aa, bb > (ab > ba)). Why (E) comes from extensionality will be explained 
later. 
We can show that GO is an adequate characterization of elementary ontology. 


Theorem 1. If G+LA FT > A, then GOFT > A. 


Proof. It is sufficient to prove that the axiomatic sequent LA is provable in GO. 


cb = cb 
—————_ (T 
(R) 24 = aa ca, ab = cb - ) 
(= 3) ab => aa ab => ca — cb (=Y) 
~" ab => Jz(zxa) ab => Va(xa — xb) eh 
ab => Jx(xa) A Va(xa —> xb) 
(= A) with: 
cd => cd 
ca,ad > cd 7) 
(S) 
ca, da, aa => cd (R) 
ca, da, ab => cd (A>) 
ab, ca ^ da => cd 
(=> 


ab => ca ^ da — cd 
ab => Vay(aa A ya > zy) 


> Vy) 
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yields LA?” after (=—). A proof of the converse is more complicated (for read- 
ability and space-saving we ommited all applications of weakening rules neces- 
sary for the application of two- and three-premiss rules; this convention will be 
applied hereafter with no comments): 


(> A) da => da ca > ca 
da, ca => da ^ ca dc => dc 
( (v : da, ca, da ^A ca — de > de da => da (T) 
da, ca, Yxy(xa ^ ya > xy) => de dc, ca => da ab => ab 
ca => ca (E) cb, ca, Yxy(za ^ ya > zy) = ab 


ca, ca — cb, Yxy(xa A ya —> xy) => ab 


y 
ca, Yx(xa — xb), Yxy(xa A ya > xy) = ab ( 5 ) 


) 


da(xa),Va(ra — xb), Yxy(xa A ya > xy) => ab 


It is routine to prove LA. 


Note that to prove LA?” the rules (R),(T),(S) were sufficient, whereas in 
order to derive the converse, (E) alone is not sufficient - we need (T) again. 


Theorem 2. If GOF =A, then G+LA Fr > A. 


Proof. It is sufficient to prove that the four rules of GO are derivable in G+LA. 
For (T): 


ab = ab ac => ac ( 
ab — ac, ab => ac 


be => Va(xb > xc) Va(rb —> xc),ab > ac 
ab, bc > ac 


=>) 
v= 
(Cut) 
ab, bc, l > A 
where the leftmost leaf is provable in G+LA (Lemma 2). 
For (S): 


ac, >A 


(Cut) 


bb => bb ab = ab ( ) 
bb, ab = bb A ab ba => ba 


bb A ab — ba, bb, ab => ba 
bb > Vary(xb A yb > xy) Vay(xb A yb > xy), bb, ab => ba 
bb, bb, ab => ba 
bb, ab => ba Ca) 


(>>) 


(v 
Cut) 


where the leftmost leaf is provable in G+LA (Lemma 2). By cut with the premiss 


of (S) we obtain its conclusion. 
For (R): 


ab > Ax(xa) S 


(Cut) 
ab => Vary(xa ^ ya > xy) Vay(xa A ya > xy), Vx(xa — xa), ab => aa (Cut) 
u 


) 


Yz(ra — xa), ab, ab > aa 


Va(ra — za), ab > aa 
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where S := Jz(za),Vxy(za ^A ya > zy), Yx(xa — xa) = aa and all leaves are 
provable in G+LA (Lemma 2); in particular S is the fourth sequent with b 
replaced with a. By cut with > Vz(xa — xa) and the premiss of (R) we obtain 
its conclusion. 


Since (R), (T), (S) are all derivable in G+LA we use them in the proof of the 
derivability of (E) to simplify matters. Note first the following three proofs with 
weakenings omitted: 


CC => CC 
Co) ae seamen 
cb => cc ca => ca 
ca +> cc, cb > ca 


ca © cc, cb > Ax(xa) 


Va(aa > xc), cb > Ja(axa) 


db = db 
da => da dc, cb = db 


da +> dc, cb, da = db 
Va(xa => xc), cb, da = db 
Vr(za => zc), cb = da — db 
YVr(za => zc), cb > Yr(xra > xb) 


(T) 


and 


de = de 
ce, dc => de cs 
ec, dc, cc => de 
ea > ea ec, dc, cb > de 
dc, ea > ec, cb, ea > de 
da = da dc, Yx(xa +> ac), cb, ea = de 


(==) 
vs) 
(C=) 


da = dc, Vx(xa > xc), cb, da, ea = de 


Va(ra > xc), Vx(xa > ac), cb, da, ea => de 


(A =) 
(>) 


Va(aa => xc), cb > Vry(aa A ya > ry) 


Va(aa > ac), cb, da, ea = de 


Va(xra > xc), cb,da ^ ea => de 
Va(aa => zc), cb > da ^ ea — de 


By three cuts with da(va),V2(aa — xb), Vry(aa A ya > xy) = ab and 
contractions we obtain a proof of S := Vx(xa + xc),cb = ab. Then we finish in 
the following way: 


da, l => A, dc dc, I’ => A,da 


(>a T Tr > A, da > dc 
Cut I > A,Va(xa © zc) S 
(Gut) cb, T > A, ab ab, => A 
(Cut) 


b, > A 
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Note that to prove derivability of (E) we need in fact the whole LA. We 
elaborate on the strength of this rule in Sect. 5. 


4 Cut Elimination 


The possibility of representing LA by means of these four rules makes GO a 
calculus with desirable proof-theoretic properties. First of all note that for G 
the cut elimination theorem holds. Since the only primitive rules for € are all 
one-sided, in the sense that principal formulae occur in the antecedents only, we 
can easily extend this result to GO. We follow the general strategy of cut elim- 
ination proofs applied originally for hypersequent calculi in Metcalfe, Olivetti 
and Gabbay [13] but which works well also in the context of standard sequent 
calculi (see Indrzejezak [5]). Such a proof has a particularly simple structure and 
allows us to avoid many complexities inherent in other methods of proving cut 
elimination. In particular, we avoid well known problems with contraction, since 
two auxiliary lemmata deal with this problem in advance. Note first that for GO 
the following result holds: 


Lemma 3 (Substitution). ff, Il => A, then Fp I'[a/b] > Ala/d]. 


Proof. By induction on the height of a proof. Note that (E) may require similar 
relettering like (3 =) and (= V). Note that the proof provides the height- 
preserving admissibility of substitution. 


Let us assume that all proofs are regular in the sense that every parameter 
a which is fresh by side condition on the respective rule must be fresh in the 
entire proof, not only on the branch where the application of this rule takes 
place. There is no loss of generality since every proof may be systematically 
transformed into a regular one by the substitution lemma. The following notions 
are crucial for the proof: 


1. The cut-degree is the complexity of cut-formula y, i.e. the number of connec- 
tives and quantifiers occurring in y; it is denoted as dy. 
2. The proof-degree (dD) is the maximal cut-degree in D. 


Remember that the complexity of atomic formulae, and consequently of cut- 
and proof-degree in case of atomic cuts, is 0. The proof of the cut elimination 
theorem is based on two lemmata which successively make a reduction: first on 
the height of the right, and then on the height of the left premiss of cut. y*, T* 
denote k > 0 occurrences of p, I’, respectively. 


Lemma 4 (Right reduction). Let Dı F I > A,y and Do F y*, H > X with 
dDı,dDə < dp, and ọ principal in T => A, vy, then we can construct a proof D 
such that DE TE, IH > A}, X and dD < dy. 
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Proof. By induction on the height of Də. The basis is trivial, since l > A,y 
is identical with T, 17 => A*, X. The induction step requires examination of 
all cases of possible derivations of yt, H => X, and the role of the cut-formula 
in the transition. In cases where all occurrences of p are parametric we simply 
apply the induction hypotheses to the premisses of y*, H > X and then apply 
the respective rule — it is essentially due to the context independence of almost 
all rules and the regularity of proofs, which together prevent violation of side 
conditions on eigenvariables. If one of the occurrences of y in the premiss(es) is 
a side formula of the last rule we must additionally apply weakening to restore 
the missing formula before the application of the relevant rule. 

In cases where one occurrence of ọ in y*, IT > X is principal we make use of 
the fact that vy in the left premiss is also principal; for the cases of contraction 
and weakening it is trivial. Note that due to condition that ọ is principal in the 
left premiss it must be compound, since all rules introducing atomic formulae 
as principal are working only in the antecedents. Hence all cases where one 
occurrence of atomic y in the right premiss would be introduced by means 
of (R), (S), (T), (E) are not considered in the proof of this lemma. The only 
exceptions are axiomatic sequents T = A, y with principal atomic y, but they 
do not make any harm. 


Lemma 5 (Left reduction). Let Dı F I > A,y* and Do F Y, H > X with 
dD,,dDz < dy, then we can construct a proof D such that D H T, H" > A, XÒ 
and dD < dy. 


Proof. By induction on the height of Dı but with some important differences. 
First note that we do not require y to be principal in y, H = X so it includes 
the case with y atomic. In all these cases we just apply the induction hypothesis. 
This guarantees that even if an atomic cut formula was introduced in the right 
premiss by one of the rules (R), (S), (T), (E) the reduction of the height is done 
only on the left premiss, and we always obtain the expected result. Now, in cases 
where one occurrence of y in I > A, g} is principal we first apply the induction 
hypothesis to eliminate all other k — 1 occurrences of y in premisses and then 
we apply the respective rule. Since the only new occurrence of ọ is principal we 
can make use of the right reduction lemma again and obtain the result, possibly 
after some applications of structural rules. 


Now we are ready to prove the cut elimination theorem: 
Theorem 3. Every proof in GO can be transformed into cut-free proof. 


Proof. By double induction: primary on dD and subsidiary on the number of 
maximal cuts (in the basis and in the inductive step of the primary induction). 
We always take the topmost maximal cut and apply Lemma 5 to it. By successive 
repetition of this procedure we diminish either the degree of a proof or the 
number of cuts in it until we obtain a cut-free proof. 


As a consequence of the cut elimination theorem for GO we obtain: 
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Corollary 1. If- r => A, then it is provable in a proof which is closed under 
subformulae of T U A and atomic formulae. 


So cut-free GO satisfies the form of the subformula property which holds for 
several elementary theories as formalised by Negri and von Plato [14]. 


5 Modifications 


Construction of rules which are deductively equivalent to axioms may be to 
some extent automatised (see e.g. Negri and von Plato [14], Braiiner [1], or 
Marin, Miller, Pimentel and Volpe [12]). Still, even the choice of the version of 
(equivalent) axiom which will be used for transformation, may have an impact 
on the quality of obtained rules. Moreover, very often some additional tuning is 
necessary to obtain rules, which are well-behaved from the proof-theoretic point 
of view. In this section we will focus briefly on this problem and sketch some 
alternatives. 

In our adequacy proofs we referred to the original formulation of LA, since 
rules (R), (T), (S) correspond directly in a modular way to three conjuncts of 
LA”. Our rule (E) however, is modelled not on LA“ but rather on the suitable 
implication of variant 3 of LA from Lemma 1. As a first approximation we can 
obtain the rule: 


r= A,Az(Vu(va © vz) A zb) 
I => A,ab 
which after further decomposition and quantifier elimination yields: 


da, l= A,dce dc, l= A,da I => A,cb 
I => A,ab 


(where d is a new parameter) which is very similar to (Æ) but with some active 
atoms in the succedents. This is troublesome for proving cut elimination if ab 
is a cut formula and a principal formula of (R), (S) or (T) in the right premiss 
of cut. Fortunately, (E) is interderivable with this rule (it follows from the rule 
generation theorem in Indrzejczak [5]) and has the principal formula in the 
antecedent. 

It is clear that if we focus on other variants then we can obtain different rules 
by their decomposition. In effect note that instead of (E) we may equivalently 
use the following rules based directly on LA, or on variants 2 and 1 respectively: 


da, l= A,db da,ea,l'=> A,de ab, => A 


ca, >A 
da, l= A,dc da, ľ = A,cd ab, => A 


ca,cb, l = A 
da,ea, => A, de ab,’ => A 
ca, cb, I, => A 
where d,e are new parameters (eigenvariables). 
Note, that each of these rules, used instead of (E£), yields a variant of GO for 
which we can also prove cut elimination. However, as we will show by the end 


(Era) 


(E2) 


(E1) 
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of this section, (E) seems to be optimal. Perhaps, the last one is the most eco- 
nomical in the sense of branching factor. However, since its left premiss directly 
corresponds to the condition Vry(ra A ya > xy) it introduces two different new 
parameters to premisses which makes it more troublesome in some respects. In 
fact, if we want to reduce the branching factor it is possible to replace all these 
rules by the following variants: 


da, l= A,dc dc, l= A,da 


(E") 


cb, I’ => A,ab 
(E’, .) da, l= A,db da,ea, => A, de 
Pe ca, I’ = A,ab 
(E}) da, l= A,dc da, l= A, cd 
2 ca, cb, I’ = A, ab 
da, ea, => A, de 
(Ei) 


ca, cb, I’ = A, ab 


with the same proviso on eigenvariables d,e. Their interderivability with the 
rules stated first is easily obtained by means of the rule generation theorem too. 
These rules seem to be more convenient for proof search. However, for these 
primed rules cut elimination cannot be proved in the constructive way, for the 
reasons mentioned above, and it is an open problem if cut-free systems with 
these rules as primitive are complete. 

We finish this section with stating the last reason for choosing (E). Let us 
explain why (E), the most complicated specific rule of GO, was claimed to be 
connected with extensionality. Consider the following two principles: 


WE Yr(za © xb) > Vz(ax > br) 
W Ext Vx(xa > xb) > Va(p(a,a) = p(x, b)) 


where (x,a) denotes arbitrary formula with at least one occurrence of x (not 
bound by any quantifier within ọ) and a. 


Lemma 6. WE is equivalent to W Ext. 


Proof. That WE follows from W Ext is obvious since the former is a specific 
instance of the latter. The other direction is by induction on the complexity of 
y. In the basis there are just two cases: y(x, a) is either xa or ax; the former is 
trivial and the latter is just WE. The induction step goes like an ordinary proof 
of the extensionality principle in FOL. 


Lemma 7. In G (E) is equivalent to (WE). 
Proof. Note first that in G the following sequents are provable: 


— Va(ax «> cx), cb > ab 
— Va(ra +> zc), da => dc 
— Vz(xa «> zc), dc > da 
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we will use them in the proofs to follow. 


For derivability of (E): 


da, I’ => A,dc dc, I’ => A,da 


ie) T => A, da < dc 
(=v) 
(Cut) I > A,Va(xa © zc) D 
u 
Cait I > A,Vau(ax © cx) Va(ax > cx), cb => ab 
on cb, T => A, ab 


where D is a proof of Vz(xa @ xc) > Va(ax +> cx) from WE and the rightmost 
sequent is provable. The endsequent by cut with ab, l = A yields the conclusion 


of (E). 
Provability of WE in G with (E): 


Va(xa > xc), da = dc Va(xa > zc), dc => da ab => ab 
Va(aa > xc), cb => ab 


(E) 


In the same way we prove Yz(xa +> xc), ab = cb which by (><), (=> Y) and 
(=—) yields WE. 


This shows that we can obtain the axiomatization of elementary ontology 
by means of LA” and WE (or W Ext). Also instead of LA?” we can use three 
axioms corresponding to our three rules (R), (S), (T). Note that if we get rid 
of (E) (or WE) we obtain a weaker version of ontology investigated by Takano 
[18]. If we get rid of quantifier rules we obtain a quantifier-free version of this 
system investigated by Ishimoto and Kobayashi [6]. 

On the basis of the specific features of sequent calculus we can obtain here 
for free also the intuitionistic version of ontology. As is well known it is sufficient 
to restrict the rules of G to sequents having at most one formula in the succedent 
(which requires small modifications like replacement of (>=) and (= V) with 
two variants having always one side formula in the succedent) to obtain the 
version adequate for the intuitionistic FOL. Since all specific rules for € can be 
restricted in a similar way, we can obtain the calculus GIO for the intuitionistic 
version of elementary ontology. One can easily check that all proofs showing the 
adequacy of GO and the cut elimination theorem are either intuitionistically 
correct or can be easily changed into such proofs. The latter remark concerns 
these proofs in which the classical version of (=) required the introduction of 
the second side formula into succedent by (= W); the intuitionistic two versions 
of (=) do not require this step. 


6 Extensions 


Leśniewski and his followers were often working on ontology enriched with defi- 
nitions of special predicates and name-creating functors. In this section we focus 
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on a number of unary and binary predicates which are popular ontological con- 
stants. Instead of adding these definitions to GO we will introduce predicates 
by means of sequent rules satisfying conditions formulated for well-behaved SC 
rules. Let us call Lp the language of Lo enriched with all these predicates and 
GOP, the calculus with the additional rules for predicates. The definitions of the 
most important unary predicates are: 


Da := Jz(xa) Va := ~3r(zxa) 
Sa := Jxr(ax) Ga := Ary(xa ^ ya ^nry) 


D, V, S,G are unary predicates informing that a is denoting, empty (or void), 
singular or general. D and S' are Lesniewski’s ex and ob respectively. He preferred 
also to apply sol(a) which we symbolize with U (for unique): 


Ua := Yxy(xa ^ ya > xy) [or simply =Ga] 


The additional rules for these predicates are of the form: 


ba, => A T= A,ca ab, T= A 
>) Da, l = A bey r= A, Da oe) Sa, => A 
r= A,ac T= A,ca ba, => A 
HO) Ta oy) aA VA Wa 


where b is new and c arbitrary in all schemata. 
ba, ca, => A, be T= A,da T= A,ea de, l =A 


oa) E Gara ee) T> A,Ga 
(a0) ba, ca, => A, be (U >) r= A,da T>A,ea de, l => A^ 
r= A, Ua Ua, T> A 


where b,c are new, and d,e are arbitrary parameters. 
The binary predicates of identity, (weak and strong) coextensiveness, nonbe- 
ing b, subsumption and antysubsumption are defined in the following way: 


b := ab ba aéb := aa A ~ab 
b:=Va(ra > xb) aC b:=Va(xa > xb) 
b:=a=bADa a b:=V2(xra > 72) 


Finally note that Aristotelian categorical sentences can be also defined in 
Lesniewski’s ontology: 


aAb:=aCbA Da aEb:=aE¢ b^ Da 
alb := Aa(xa ^ xb) aOb := Jx(xa A 72d) 


The rules for binary predicates: 


ab, ba, > A r= A,ab T= A,ba 


=) a=b >A (FE r= A,a=b 
Ga r => A,ca,cb ca,cb, [T => A E da, l’ = A,db db, [l => A, da 
i a=b >A z r= A,a=b 


psy da, l’ = A,ca,cb  ca,cb,da, > A 
a axbr>A 
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(eee da, => A,db db, >A,da T > A,ca 
g T>A,axb 
E>) aa, l’ => A, ab (>a) r= ^A,aa ab, > A 
£ a&b => A 7 T = A, ab 
(c>) [>A,ca œ >A ($c) da, I’ = A, db 
acb >A r= Aach 
(=) r= A^,ca T => A,cb (>g) da, db, l = A 
agb, T => A T>A,a¢b 
(AS) da, l => A,ca cb,da, ’ >A (> A) da, l’ => A,db T = A,ca 
ab, => A I = A,aAb 
(E>) da, l => A,ca da, l => A,cb (= E) da,db, r> A T > A,ca 
akb,r => A [I > A,aEb 
(I) da, db, l’ > A (= 1) [>A,ca [>A,cb 
alb, = A T => A,alb 
(03) da, I’ = A, db (= 0) r>A,ca œ =A 
aOb, l’ => A I => A,aOb 


where d is new and c arbitrary (but c can be identical to d in rules for ~, A, E). 

Proofs of interderivability with equivalences corresponding to suitable defi- 
nitions are trivial in most cases. We provide only one for the sake of illustration. 
The hardest case is %. 


da, ca = ca, cb 


(><) 


da, ca, cb = cb 
a 7X b, ca = cb 


(=y) 


da, ca = ca, cb da, ca, cb = ca 
aw b, cb => ca 


(=>) 


a x b > ca + cb 
a ~ b => Vzr(xa +> xb) 


and 


ca => ca, aa, ab 


( ca, aa, ab => ca 


ca => Jx(xa), aa, ab ca, aa, ab => Jz(xa) 


a x b => Jr(za) 


by (= A) yield one part. For the second: 


Va(aa > xb), da => db Va(ra > xb), db > da 


Va(ra > xb), ca => a ~ b 


ca => ca 


(4) 


(d=) 
(A=) 


Va(ra > xb), Ix(za) > ab 


Va(ra > xb) A Ir(za) > ae b 


where the left and the middle premiss are obviously provable by means of (Y =), 
(=). We omit proofs of the derivability of both rules in GO enriched with the 
axiom => Va(ra > xb) A Jz(za) > a % b. 

We treat all these predicates as new constants hence their complexity is fixed 
as 1, in contrast to atomic formulae, which are of complexity 0. Of course we 
can consider ontology with an arbitrary selection of these predicates according 
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to the needs. Accordingly we can enrich GO also with arbitrary selection of 
suitable rules for predicates. All the results holding for GOP are correct for any 
subsystem. Let us list some important features of these rules and enriched GO: 


1. All rules for predicates are explicit, separate and symmetric, which are usual 
requirements for well-behaved rules in sequent calculi (see e.g. [5]). In this 
respect they are similar to the rules for logical constants and differ from spe- 
cific rules for € which are one-sided (in the sense of having principal formulae 
always in the antecedent). 

2. All these new rules satisfy the subformula property in the sense that side 
formulae are only atomic. 

3. The substitution lemma holds for GO with any combination of the above 
rules. 

4. All rules are pairwise reductive, modulo substitution of terms, 


We do not prove the substitution lemma, since the proof is standard, but 
we comment on the last point, since cut elimination holds due to 3 and 4. The 
notion of reductivity for sequent rules was introduced by Ciabattoni [2] and it 
may be roughly defined as follows: A pair of introduction rules (= x), (x =) for 
a constant x is reductive if an application of cut on cut formulae introduced by 
these rules may be replaced by the series of cuts made on less complex formulae, 
in particular on their subformulae. Basically it enables the reduction of cut- 
degree in the proof of cut elimination. Again we illustrate the point with respect 
to the most complicated case. Let us consider the application of cut with the 
cut formula a ~ b, then the left premiss of this cut was obtained by: 


pax) ca, I’ = A, cb cb, l => A, ca T => A, da 
= C=>A,axb 
where c is new and d is arbitrary. And the right premiss was obtained by: 


(w=) ea, IT => X, fa, fb ea, fa, fb, I => X 
“ axb, I> x 


where e is new and f is arbitrary. 
By the substitution lemma on the premisses of (>), (x=) we obtain: 


. fa, [T = A, fb 
. fb, T => A, fa 
. da, IT => X, fa, fb 
. da, fa, fb, I => X 


Ae Nhe 


and we can derive: 


T => A, da da, II = X, fa, fb 
(Cut) T I > A,X, fa, fb fb, T > A, fa 
T, T, I => A, A, fa, fa 
TI > A,X, fa D 
(Ea O) TI II > AA SS 
TI>A,X 


(Cut) 
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where D is a similar proof of fa, I’, 7 => A, X from F => A, da, 4 and 1 by cuts 
and contractions. All cuts are of lower degree than the original cut. It is routine 
exercise to check that all rules for predicates are reductive and this is sufficient 
for proving Lemma 4 and 5 for GOP. As a consequence we obtain: 


Theorem 4. Every proof in GOP can be transformed into cut-free proof. 


Since the rules are modular this holds for every subsystem based on a selec- 
tion of the above rules. 


7 Conclusion 


Both the basic system GO and its extension GOP are cut-free and satisfy a form 
of the subformula property. It shows that Lesniewski’s ontology admits standard 
proof-theoretical study and allows us to obtain reasonable results. In particular, 
we can prove for GO the interpolation theorem using the Maehara strategy 
(see e.g. [19]) and this implies for GO other expected results like e.g. Beth’s 
definability theorem. Space restrictions forbid to present it here. On the other 
hand, we restricted our study to the system with simple names only, whereas 
fuller study should cover also complex names built with the help of several name- 
forming functors. The typical ones are the counterparts of the well-known class 
operations definable in Lesniewski’s ontology in the following way: 


ab:=aaA-7ab a(bNc):=abAac a(bUc) :=abVac 


It is not a problem to provide suitable rules corresponding to these definitions: 


js) aa, I’ => A,ab E ab, > A F>A,aa 
ab, T => A T = A,ab 
(ns) ab,ac, l’ => A (=n) [=>A,ab I> A,ac 
abn c), r >A r = A,a(bnc) 
(Us) a, >A acr >A (= U) I = A, ab, ac 
a(bUc), [>A r= A,a(bUc) 


Although their structure is similar to the rules provided for predicates in the 
last section, their addition raises important problems. One is of a more general 
nature and well-known: definitions of term-forming operations in ontology are 
creative. Although it was intended in the original architecture of Lesniewski’s 
systems, in the modern approach this is not welcome. Iwanuś [7] has shown that 
the problem can be overcome by enriching elementary ontology with two axioms 
corresponding to special versions of the comprehension axiom but this opens a 
problem of derivability of these axioms in GO enriched with special rules. 

There is also a specific problem with cut elimination for GO with added 
complex terms and suitable rules. Even if they are reductive (and the rules 
stated above are reductive, as a reader can check), we run into a problem with 
quantifier rules. If unrestricted instantiation of terms is admitted in (= 3), (Y >) 
the subformula property is lost. One can find some solutions for this problem, 
for example by using two separated measures of complexity for formula-makers 
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and term-makers (see e.g. [3]), or by restricting in some way the instantiation 
of terms in respective quantifier rules (see e.g. [4]). The examination of these 
possibilities is left for further study. 

The last open problem deserving careful study is the possibility of application 
for automated proof-search and obtaining semi-decision procedures (or decision 
procedures for quantifier-free subsystems) on the basis of the provided sequent 
calculus. In particular, due to modularity of provided rules, one could obtain in 
this way decision procedures for several quantifier-free subsystems investigated 
by Pietruszczak [15], or by Ishimoto and Kobayashi [6]. 
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Abstract. A strategy schedule allocates time to proof strategies that 
are used in sequence in a theorem prover. We employ Bayesian statistics 
to propose alternative sequences for the strategy schedule in each proof 
attempt. Tested on the TPTP problem library, our method yields a time 
saving of more than 50%. By extending this method to optimize the 
fixed time allocations to each strategy, we obtain a notable increase in 
the number of theorems proved. 


Keywords: Bayesian machine learning - Strategy scheduling + 
Automated theorem proving 


1 Introduction 


Theorem provers have wide-ranging applications, including formal verification 
of large mathematical proofs [9] and reasoning in knowledge-bases [37]. Thus, 
improvements in provers that lead to more successful proofs, and savings in the 
time taken to discover proofs, are desirable. 

Automated theorem provers generate proofs by utilizing inference procedures 
in combination with heuristic search. A specific configuration of a prover, which 
may be specialized for a certain class of problems, is termed a strategy. Provers 
such as E [27] can select from a portfolio of strategies to solve the goal theorem. 
Furthermore, certain provers hedge their allocated proof time across a number 
of proof strategies by use of a strategy schedule, which specifies a time allocation 
for each strategy and the sequence in which they are used until one proves the 
goal theorem. This method was pioneered in the Gandalf prover [33]. 

Prediction of the effectiveness of a strategy prior to a proof attempt is usually 
intractable or undecidable [12]. A practical implementation must infer such a 
prediction by tractable approximations. Therefore, machine learning methods 
for strategy invention, selection and scheduling are actively researched. Machine 
learning methods for strategy selection conditioned on the proof goal have shown 
promising results [3]. Good results have also been reported for strategy synthesis 
using machine learning [1]. Work on machine learning for algorithm portfolios— 
which allocate resources to multiple solvers simultaneously—is also relevant to 
strategy scheduling because of its similar goals. For this purpose, Silverthorn 
and Miikkulainen propose latent class models [31] . 
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In this work, we present a method for generating strategy schedules using 
Bayesian learning with two primary goals: to reduce proving time or to prove 
more theorems. We have evaluated this method for both purposes using iLean- 
CoP, an intuitionistic first-order logic prover with a compact implementation 
and good performance [18]. Intuitionistic logic is a non-standard form of first- 
order logic, of which relatively little is known with regard to automation. It is 
of interest in theoretical computer science and philosophy of mathematics [7]. 
Among intuitionistic provers, iLeanCoP is seen as impressive and is able to prove 
a sufficient number of theorems in our benchmarks for significance testing. Its 
core is implemented in around thirty lines of Prolog; such simplicity adds clarity 
to interpretations of our results. Our method was benchmarked on the Thou- 
sands of Problems for Theorem Provers (TPTP) problem library [32], in which 
we are able to save more than 50% on proof time when aiming for the former 
goal. Towards the latter goal, we are able to prove notably more theorems. 

Our two primary, complementary, contributions presented here are: first, a 
Bayesian machine learning model for strategy scheduling; and second, engineered 
features for use in that model. The text below is organized as follows. In Sect. 2, 
we introduce preliminary material used subsequently to construct a machine 
learning model for strategy scheduling, described in Sects. 3-7. The data used to 
train and evaluate this model are described in Sect.8, followed by experiments, 
results and conclusions in Sects. 9-12. 


2 Distribution of Permutations 


We model a strategy schedule using a vector of strategies, and thus all schedules 
are permutations of the same. 


Definition 1 (Permutation). Let M € N. A permutation m € NY is a vector 
of indices, with m; € {1,...,M} and Vi Æ j : mi A Tj, representing a reordering 
of the components of an M-dimensional vector s to [Sm;, Sr2;---, Sru]! - 


In this text, vector-valued variables, such as m above, are in boldface, which 
must change when they are indexed, like 7, for example. For probabilistic mod- 
elling of schedules represented using permutations, we use the Plakett-Luce 
model [14,21] to define a parametric probability distribution over permutations. 


Definition 2 (Plakett-Luce distribution). The Plakett-Luce distribution 
Perm(A) with parameter A € Ras has support over permutations of indices 
{1,..., M}. For permutation IT distributed as Perm(A), 


M X 
Pr(II = m; A) = Il =. 
j=l D Aru 


In latter sections, we use the parameter A to assign an abstract ‘score’ to 
strategies when modelling distributions over schedules. This score is particularly 
useful due to the following theorem. 
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Theorem 1. Let n* be a mode of the distribution Perm(A), that is 


m™* = argmax Pr(7; A). 


Then, Ant 2 Àr > Ar S... SX 


2 3 TM’ 

Thus, assuming A is a vector of the score of each strategy, the highest probability 
permutation indexes the strategies in decreasing order of scores. Conversely, the 
highest probability permutation can be obtained efficiently by sorting the indices 
of A with respect to their corresponding values in decreasing order. Cao et al. [4] 
have presented a proof of Theorem 1, and Cheng et al. [5] have discussed some 


further interesting details. 
Example 1. Let A = [1,9]", r® = [1,2]" and w@) = [2,1]'. Then, 


A AnD 1 9 1 


Pr( =n; A) = ; = aa 
4 m ) A, FAW A9 1+9 9 10 
1 2 2 


Similarly, Pr(IZ = 2); A) = 9/10. 


Theorem 2. Perm(cA) = Perm(A), for any scalar constant c > 0. 


In other words, the Plakett-Luce distribution is invariant to the scale of the 
parameter vector. 


Lemma 1. Perm(exp(A + c)) = Perm(exp(A)), for any scalar constant c € R. 


Lemma 1 follows from Theorem 2, and shows the same distribution is translation 
invariant if the parameter is exponentiated. Cao et al. [4] give proofs of both. 


3 A Maximum Likelihood Model 


We model a strategy schedule as a ranking of known strategies, where each strat- 
egy is constructed by a parameter setting and time allocation. A ranking therein 
is a permutation of strategies, with each strategy retaining its time allocation 
irrespective of the ordering. We construct, in this section, a model for inference 
of such permutations that is linear in the parameters. 

Suppose we have a repository of N theorems which we test against each of 
our M known strategies to build a data-set D = {(n®, s), where 1 
is a desirable ordering of strategies for theorem i and «“ is a feature vector 
representation of the theorem. In Sect.9, we detail how we instantiated D for 
our experiments, which serves as an example for any other implementation. We 
assume that 7“) has Plakett-Luce distribution conditioned on #™ such that 


Pr(7; z,w) = Perm(A(a,w)), (1) 


where w is a parameter the model must learn and A(-) is a vector-valued function 
of range R%4. We use the notation A(-), to index into the value of A(-). We 
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>M 
represent our prover strategies with feature vectors {d0} jai: To calculate the 
score of strategy j using A(-);, we specify 


Ae, w); = exp (d(a™, d'))"w) (2) 


to ensure that the scores are positive valued, where @ is a suitable basis expansion 
function. Assuming the data is i.i.d, the likelihood of the parameter vector is 
given by 


N 
L(w) = p(D; w) = J [ Prr”; A, w). (3) 


An w that maximizes this likelihood can then be used to forecast the distri- 
bution over permutations for a new theorem «* by evaluating Perm(A(a*, @)) 
for all permutations. This would incur factorial complexity; however, we are 
often only interested in the most likely permutation, which can be retrieved in 
polynomial time. Specifically for strategy scheduling the permutation with the 
highest predicted probability should reflect the orderings in the data. For this 
purpose, we use Theorem 1 to find the highest probability permutation m* by 
sorting the values of {A(a*, @) oan in descending order. 


Remark 1. A method named ListNet designed to rank documents for search 
queries using the Plakett-Luce distribution is evaluated by Cao et al. [4]. Their 
evaluation uses a linear basis expansion. We can derive a similar construction in 
our model by setting 

b(a, d) = [xT day", (4) 
Remark 2. The likelihood in Equation (3) can be maximized by minimizing the 
negative log likelihood (w) = —log £(w), which (as shown by Schafer and 
Hiillermeier [26]) is convex and therefore can be minimized using gradient-based 
methods. The minima may, however, be unidentifiable due to translation invari- 
ance, as demonstrated by Lemma 1. This problem is eliminated in our Bayesian 
model by the use of a Gaussian prior, as explained in Sect. 4. 


Example 2. Let there be N = 2 theorems and M = 2 strategies. Let the the- 
orems and strategies be characterized by univariate values such that «) = 1, 
r® = 2, d) =1 and d?) = 2. 


Suppose strategy d“) is ideal for theorem x“) and strategy d() d) d@) 
for x), as shown on the right, where a + indicates the preferred aD) + — 
strategy. rl — + 


This is evidently an example of a parity problem [34], and hence cannot 
be modelled by a simple linear expansion using the basis function mentioned 
in Remark 1. A solution in this instance is to use 


ba dM) = 2 dO., 
The parameter w is then one-dimensional, and the required training data takes 


the form D = {([1, 2], 1), ({2,1]', 2)}. We find that £(w) is convex, with maxima 
at © = 0.42 as shown in Fig. 1. 
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Fig. 1. The likelihood function in Example 2. 


4 Bayesian Inference 


We place a Gaussian prior distribution on the parameter w of the model 
described in Sect. 3. This has two advantages: first, the posterior mode is iden- 
tifiable, as noted by Johnson et al. [11] and demonstrated in Example 3 on page 
7; second, the parameter is regularized. With this prior specified as the normal 
distribution 

w ~ N(mo, So), (5) 


and assuming 7 is independent of D given (x,w), the posterior predictive dis- 
tribution is 
p(a|e",D) = | pirla, w)p(w\D)dw. 


which may be approximated by sampling from the posterior, 


w ~ p(w|D), (6) 
to obtain 
18 
p(m\x*,D) = 5 pala", w’). (7) 
s=1 


Given a new theorem æ*, to find the permutation of strategies with the highest 
probability of success, using the approximation above would require its evalu- 
ation for every permutation of m. This process incurs factorial complexity. We 
instead make a Bayes point approximation |16] using the mean values of the 
samples such that, 


p(r|æ*, D) = p(r|æ*, (w°)) using Eq. (7) 
Pr(m|A(x*, (w*))) using Eq. (1), 


where (-) denotes mean value. The mean of the Plakett-Luce parameter for 
Bayesian inference has been used in prior work [8] to obtain good results. Fur- 
thermore, using that, the highest probability permutation can be obtained by 
using Theorem 1, thereby incurring only the cost of sorting the items. This 
saving is substantial when generating a strategy schedule, because it saves on 
prediction time, which is important for the following reason. 
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Algorithm 1. Metropolis-Hastings Algorithm 


Suppose we have generated samples {uw ots ,w®} from the target distribution p. 
Generate wt» as follows. 


1: Generate candidate value w ~ qlw ®), where q is the proposal distribution. 
2: Evaluate r = r(w®, ù) where 


TAT iii {ae alely) if. 


3: Set 
(i+1) w with probability r 
w = : 
w) with probability 1 — r. 


Remark 3. While benchmarking and in typical use, a prover is allocated a fixed 
amount of time for a proof attempt, and any time taken to predict a strategy 
schedule must be accounted for within this allocation. Time taken for this pre- 
diction is time taken away from the prover itself which could have been invested 
in the proof search. Therefore, it is essential to minimize schedule prediction 
time. It is particularly wise to favour a saving in prediction time at the cost of 
model optimization and training time. 


Remark 4. In our implementation we set mp = 0. This has the effect of priori- 
tizing smaller weights w in the posterior. Furthermore, we set So = nI, n € R, 
where I is the identity matrix. Consequently, the hyperparameter 7 controls the 
strength of the prior, since the entropy of the Gaussian prior scales linearly by 
log |Sol. 


Remark 5. A specialization of the Plakett-Luce distribution using the Thursto- 
nian interpretation admits a Gamma distribution conjugate prior [8]. That, how- 
ever, is unavailable to our model when parametrized as shown in Eq. (1). 


5 Sampling 


We use the Markov chain Monte Carlo (MCMC) Metropolis-Hastings algo- 
rithm [38] to generate samples from the posterior distribution. In MCMC sam- 
pling, one constructs a Markov chain whose stationary distribution matches the 
target distribution p. For the Metropolis-Hastings algorithm, stated in Algo- 
rithm 1, this chain is constructed using a proposal distribution y|a ~ q, where q 
is set to a distribution that can be conveniently sampled from. 

Note that while calculating r in Algorithm 1, the normalization constant of 
the target density p cancels out. This is to our advantage; to generate samples 
w° from the posterior, which is, by Eq. (3) and Eq. (5), 


p(w|D) x p(D|w)p(w) 
= L(w)N (mo, So), (8) 
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the posterior only needs to be computed in this unnormalized form. 
In this work, we choose a random walk proposal of the form 


qlw lw) = N (wlw, X4), (9) 


and tune 2’, for efficient sampling simulation. We start the simulation at a local 
mode w, and set N (®, X4) to approximate the local curvature of the posterior at 
that point using methods by Rossi [25]. Specifically, our procedure for computing 
X4 is as follows. 


1. First, writing the posterior from Eq. (8) as 
1 
plw|D) = en“ P), 
where Z is the normalization constant, we have 
E(w) = — log L(w) — log N (mo, So). (10) 


We find a local mode w& by optimizing E(w) using a gradient-based method. 
2. Then, using a Laplace approximation [2], we approximate the posterior in the 
locality of this mode to 


N(®,H~'), where H = VV E(w)|o 


is the Hessian matrix of E(w) evaluated at that local mode. 
3. Finally, we set 
X =° H! 
in Eq. (9), where s is used to tune all the length scales. We set this value to 
s? = 2.38 based on the results by Roberts and Rosenthal [24]. 


Remark 6. When calculating r in Algorithm 1 during sampling, to evaluate the 
unnormalized posterior at any point w* we compute it from Equation (10) as 
exp(—E(w*))—it is therefore the only form in which the posterior needs to be 
coded in the implementation. 


Example 3 (Gaussian Prior). To demonstrate the effect of using a Gaussian 
prior, we build upon Example 2, with the data taking the form 


D= {([1, 2)", 1), (2, 1)", 2)}. 


We perform basis expansion as explained in Sect. 6 with prior parameter 7 = 1.0, 
kernel ø = 0.1 and ç = 2 centres. Thus, the model parameter is 


w = jwi, w2]", w eR?. 
The unnormalized negative log posterior E(w1,w2), as defined in Eq. (10), is 


shown in Fig. 2b; and the negative log likelihood ¢(w1,w2) = — log £(w1,w2) as 
mentioned in Remark 2, is shown in Fig. 2a. Note the contrast in the shape of the 
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two surfaces. The minimum is along the top-right portion in Fig. 2a, which is flat 
and leads to an unidentifiable point estimate, whereas in Fig. 2b, the minimum 
is in a narrow region near the centre. The Gaussian prior, in informal terms, has 
lifted the surface up, with an effect that increases in proportion to the distance 
from the origin. 


at eS 
Aa ew 
w2 
> N 
© ot 
1 1 
O B 
SLP 
a o 
i En 


-5.0 -2.5 0.0 2.5 5.0 -5.0 -2.5 0.0 2.5 5.0 
Wy Wy 
(a) Likelihood function £(w1, w2) (b) Posterior function E(w, w2) 


Fig. 2. Comparison of the shape of the likelihood and the posterior functions. 


6 Basis Expansion 


Example 2 shows how the linear expansion in Remark 1 is ineffective even in very 
simple problem instances. The maximum likelihood bilinear model presented by 
Schafer and Hiillermeier [26] is related to our model defined in Sect.2 with 
the basis performing the Kronecker (tensor) product $(z,d) = x ® d. Their 
results show such an expansion produces a competitive model, but falls behind 
in comparison to their non-linear model. 

To model non-linear interactions between theorems and strategies, we use a 
Gaussian kernel for the basis expansion. 


Definition 3 (Gaussian Kernel). A Gaussian kernel k is defined by 


_ lly- zll? 
20? 


lyse) = exp ( ) , fora>d. 


The Gaussian kernel (y, z) effectively represents the inner product of y and 
z in a Hilbert space whose bandwidth is controlled by ø. Smaller values of o 
correspond to a higher bandwidth, more flexible, inner product space. Larger 
values of ø will reduce the kernel to a constant function, as detailed in [30]. 
For our ranking model, we must tune o to balance between over-fitting and 
under-performance. 
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We use the Gaussian kernel for basis expansion by setting 
$(@,d) = [x([27,7]", ce), ...,6([27, a7", O)|", 


where {es is a collection of centres. By choosing centres to be themselves 
composed of theorems x) and strategies dí), such that c) = [zO], any the 
basis expansion above represents each data item with a non-linear inner product 
against other known items. 

To find the relevant subset of D from which centres should be formed, we 
follow the method described in the steps below. 


1. Initially, we set the collection of centres to every possible centre. That is, for 
N theorems and M strategies, we produce a centre for every combination of 
the two, thereby producing C = N- M centres. 

2. Next, we use @ to expand every centre to produce the C x C matrix I such 
that 

Dj = be), = K(c ec), 


3. Then, we generate a vector y such that y; represents a score for centre ec. 
Since each centre is a combination of a theorem and a strategy, we set the 
score to signify how well the strategy performs for that theorem, as detailed 
in Remark 7 below. 

4. Finally, we use Automatic Relevance Determination (ARD) [17] with I as 
input and + as the response variable. The result is a weight assignment to 
each centre to signify its relevance. The highest absolute-weighted ¢ centres 
are chosen, where ¢ is a parameter which decides the total number of centres. 


This method is inspired by the procedure used in Relevance Vector Machines [35] 
for a similar purpose. 


Remark 7 (score). For a strategy that succeeds in proving a theorem, the score 
for the pair is the fraction of the time allocation left unconsumed by the prover. 
For an unsuccessful strategy-theorem combination, we set the score to a value 
close to zero. 


Remark 8 (s). The parameter ç is another tunable parameter which, in similar 
fashion to the parameter o earlier in this section, controls the model complexity 
introduced by the basis expansion. Both variables must be tuned together. 


7 Model Selection and Time Allocations 


From Remark 8, ¢ and o are hyperparameters that control the complexity intro- 
duced into our model through the Gaussian basis expansion; and Remark 4 intro- 
duces 7, the hyperparameter that controls the strength of the prior. The final 
model is selected by tuning them. Tuning must aim to avoid overfitting to the 
training data; and to maximize, during testing, either the savings in proof-search 
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time or the number of theorems proved. However, we do not have a closed-form 
expression relating these parameters to this aim, thus any combination of the 
parameters can be judged only by testing them. 

In this work we have used Bayesian optimization [29] to optimize these 
hyperparameters. Bayesian optimization is a black-box parameter optimization 
method that attempts to search for a global optimum within the scope of a set 
resource budget. It models the optimization target as a user-specified objective 
function, which maps from the parameter space to a loss metric. This model 
of the objective function is constructed using Gaussian Process (GP) regres- 
sion [22], using data generated by repeatedly testing the objective function. 

Our specified objective function maps from the hyperparameters (s, 0,7) toa 
loss metric €. We use cross-validation within the training data while calculating € 
to penalize hyperparameters that over-fit. Hyperparameters are tuned at training 
time only, after which they are fixed for subsequent testing. The final test set is 
never used for any hyperparameter optimization. 

In the method presented thus far we are only permuting strategies with fixed 
time allocations to build a sequence for a strategy schedule. In this setting, the 
number of theorems proved cannot change, but the time taken to prove theorems 
can be reduced. Therefore, with this aim, a useful metric for € is the total time 
taken by the theorem prover to prove the theorems in the cross-validation test 
set. 

However, we can take further advantage of the hyperparameter tuning phase 
to additionally tune the times allocated to each strategy, by treating these times 
as hyperparameters. Therefore, for each strategy d“) we create a hyperparameter 
v € (0,1) which sets the proportion of the proof time allocated to that strategy. 
We can then optimize our model to maximize the number of theorems proved; 
a count of the remaining theorems is then a viable metric for €. Note that once 
the vC) are set, time allocation for d® is fixed to v, irrespective of its order 
in the strategy schedule. 


Remark 9. Our results include two types of experiment: 


— one where the time allocations for each strategy are set to the defaults shipped 
with our reference theorem prover, and so we optimize for saving proof time; 
and 

— another wherein we allocate time to each strategy during the hyperparam- 
eter tuning phase, and so we optimize for proving the maximum number of 
theorems. 


8 Training Data and Feature Extraction 


Our chosen theorem prover, iLeanCoP, is shipped with a fixed strategy schedule 
consisting of 5 strategies. It splits the allocated proof time across the first four 
strategies by 2%, 60%, 20% and 10%. However, only the first strategy is com- 
plete and therefore usually expected to take up its entire time allocation. The 
remaining strategies are incomplete, and may exit early on failure. Therefore, 
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the fifth and final strategy, which we refer to as the fallback strategy, is allocated 
all the remaining time. 


Emulating iLeanCop. We have constructed a dataset by attempting to prove 
every theorem in our problem library using each of these strategies individually. 
With this information, the result of any proof attempt can be calculated by 
emulating the behaviour of iLeanCoP. This is how we evaluate the predicted 
schedules—we emulate a proof attempt by iLeanCoP using that schedule for 
each theorem in the test set. For a faithful emulation of the fallback strategy, it 
is always attempted last, and therefore any new schedule is only a permutation 
of the first four strategies. Our experiments allocate a time of 600s per theorem. 
The dataset is built to ensure that, within this proof time, any such strategy 
permutation can be emulated. We kept a timeout of 1200s per strategy per 
theorem when building the dataset, which is more than sufficient for current 
experiments and gives us headroom for future experiments with longer proof 
times. 


Strategy Features. Each strategy in iLeanCoP consists of a time allocation 
and parameter settings; the parameters are described by Otten [19]. We use a 
one-hot encoding feature representation for strategies based on the parameter 
setting as shown in Table 1. Another feature noting the completeness of each 
strategy is also shown. Another feature (not shown in the table) contains the 
time allocated to each strategy. Note the fallback strategy is used in prover 
emulation but not in the schedule prediction. 


Table 1. Features of the four main strategies. 


Strategy Parameter Completeness 
def | scut cut | comp(7) | conj 

def ,scut,cut,comp(7) | 1 1 1 1 0 1 

def , scut , cut 1 1 1 0 0 0 

conj,scut,cut 0 1 1 0 1 0 

def ,conj,cut 1 0 1 0 1 0 


Theorem Features. The TPTP problem library contains a large, compre- 
hensive collection of theorems and is designed for testing automated theorem 
provers. The problems are taken from a range of domains such as Logic Cal- 
culi, Algebra, Software Verification, Biology and Philosophy, and presented in 
multiple logical forms. For iLeanCoP, we select the subset in first-order form, 
denoted there as FOF. In version 7.1.0, there are 8157 such problems covering 43 
domains. Each problem consists of a set of formulae and a goal theorem. The 
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problems are of varying sizes. For example, the problem named HWV134+1 from 
the Hardware Verification domain contains 128975 formulae, whilst SET703+4 
from the Set Theory domain contains only 12. 

We have constructed a dataset containing features extracted from the first- 
order logic problems in TPTP (see Appendix A). Here, we describe how those 
features were developed. 

In deployment, a prover using our method to generate strategy schedules 
would have to extract features from the goal theorem at the beginning of a 
proof attempt. To minimize the computational overhead of feature extraction, 
in keeping with our goal noted in Remark 3, we use features that can be collected 
when the theorem is parsed by the prover. The collection of features developed 
in this work is based on the authors’ prior experience, and later we will briefly 
examine the quality of each feature to discard the uninformative ones. We extract 
the following features, which are all considered candidates for the subsequent 
feature selection process. 


Symbol Counts: A count of the logical connectives and quantifiers. We extract 
one feature per symbol by tracking lexical symbols encountered while parsing. 

Quantifier Rank: The maximum depth of nesting of quantifiers. 

Quantifier Count: A count of the number of quantifiers. 

Mean and Maximum Function Arity: Obtained by keeping track of func- 
tions during parsing. 

Number of Functions: A count of the number of functions. 

Quantifier Alternations: A count of the number of times the quantifiers flip 
between the existential and universal. When calculated by examining only the 
sequence of lexical symbols, the count may be inaccurate. An accurate count 
is obtained by tracking negations during parsing while collecting quantifiers. 
We extract both as candidates. 


Feature Selection and Pre-processing. We examine the degree of associa- 
tion between the individual theorem features described above and the speed with 
which the strategies solve each theorem; for this we use the Maximal Informa- 
tion Coefficient (MIC) measure [23]. For every theorem we calculate the score, 
as defined in Remark 7, averaged over all strategies. This score is paired with 
each feature to calculate its MIC. Most lexical symbols achieve an MIC close to 
zero. We selected the features with relatively high MIC for the presented work, 
and these are shown in Fig. 3. 

The two features based on quantifier alternations are clearly correlated, but 
both meet the above criterion for selection. Correlations can also be expected 
between the other features. Furthermore, our features range over different scales. 
For example, the maximal function arity in TPTP averages 2, whereas the num- 
ber of predicate symbols averages 2097. It is desirable to remove these correla- 
tions to alleviate any burden on the subsequent modelling phase, and to stan- 
dardize the features to zero mean and unit variance to create a feature space with 
similar length-scales in all dimensions. The former is achieved by decorrelation, 
the latter by standardization, and both together by a sphering transformation. 
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Fig. 3. MIC between selected features and scores. 


We transform our extracted features as such using Zero-phase Component Anal- 
ysis (ZCA), which ensures the transformed data is as close as possible to the 


original [6]. 


Coverage. As mentioned above, we run iLeanCoP on every first-order theo- 
rem in TPTP with each strategy allocated 1200s. Although every theorem in 
intuitionistic logic also holds for classical logic, the converse does not hold. For 
that reason and because of the limitations of iLeanCoP, many theorems remain 
unproved by any strategy. We exclude these theorems from our experiments, 
leaving us with a data-set of 2240 theorems. 


9 Experiments 


We present two experiments in this work, as noted in Remark 9. In this section, 
we describe our experimental apparatus in detail. 
As noted in Sect. 8, our data contains: 


— N = 2240 theorems that are usable in our experiments; 

— five strategies, of which M = 4 are used to build strategy schedules since one 
is a fallback strategy; and 

— features x of theorems where i € [1, N] and features d) of strategies where 
j e (1, M]. 


This data needs to be presented to our model for training in the form of 
D= {(n®,2®)}™,, as described in Sect.3. Since the two experiments have 
slightly different goals, we specialize D according to each. 

When aiming to predict schedules that minimize the time taken to prove 
theorems, a natural value for 7 is the index order that sorts strategies in 
increasing amounts of time taken to prove theorem 7. However, some strategies 
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may fail to prove theorem i within their time allocation. In that case, we consider 
the failed strategies equally bad and place them last in the ordering in m. 
Furthermore, we create additional items (’,a2) in D, by permuting the 
positions of the failed strategies in w to create multiple 7’. 

When the goal is only to prove more theorems, the strategies that succeed 
are all considered equally ranked above the failed strategies. In this mode, the 
successful strategies are similarly permuted in the data, in addition to those that 
failed. 

In each experiment, a random one-third of the N theorems are separated 
into a holdout test set N, leaving behind a training set N. This training set 
is first used for hyperparameter tuning using BO. As explained in Sect.7, each 
hyperparameter combination is tested with five-fold cross-validation within N, 
to penalize instances that overfit to N. This results in estimated optimum values 
for the hyperparameters. These are used to set the model, which is then trained 
on N and then finally evaluated on N . The whole process is repeated ten times 
with new random splits N and N to create one set of ten results for that 
experiment. 


10 Results 


Each experiment, repeated ten times, is conducted in two phases: first, hyperpa- 
rameter optimization; and second, model training and evaluation. The bounds 
on the search space in the first phase were always the same (see Appendix A). 
The holdout test set contained 747 theorems. A proof time of 600 s was emulated. 


10.1 Experiment 1: Optimizing Proof Attempt Time 


The results are shown in Fig. 4. The total prediction time for all 747 theorems, 
averaged across the trials, is 0.14s. 

The times across proof attempts are not normally distributed, for both the 
unmodified iLeanCoP schedule and the predicted ones, as confirmed by a Jarque- 
Bera test. Therefore, we used the right-tailed Wilcoxon signed-rank test for a 
pair-wise comparison of the times taken for each theorem by the original sched- 
ule in iLeanCoP versus the predicted schedules, resulting in a p-value of less 
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Fig. 4. Results of Experiment 1. Proof times are compared with precision 10~°s. 
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than 10~° in each trial, confirming the alternate hypothesis that the reduction 
in time taken to prove each theorem comes from a distribution with median 
greater than zero. This confirms that the time savings are statistically signifi- 
cant. Furthermore, we note from Fig.4 a saving of more than 50% in the total 
proof-time in each trial. 


10.2 Experiment 2: Proving More Theorems 


We set our hyperparameter search to find time allocations for strategies. The 
resulting predicted schedules have gains and losses when compared to the original 
schedule, as shown in the four facets of Fig. 5. However, there is a consistent gain 
in the number of theorems proved and a gain of five theorems on average, evident 
from the mean values in (t) and (¢). 
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Fig. 5. Comparison of the proof attempts by the original (orig.) and predicted (pred.) 
schedules in Experiment 2. Theorems which are proved by pred. but could not be 
proved by orig. are counted in f, and the vice versa in f. 


11 Related Work 


Prior work on machine learning for algorithm selection, such as that introduced 
by Leyton-Brown et al. [13], is a precursor to our work. In that topic, the machine 
learning methods must perform the task of selecting a good algorithm from 
within a portfolio to solve the given problem instance. Typically, as was the case 
in the work by Leyton-Brown et al. [13], the learning methods predict the runtime 
of all algorithms, and then pick the fastest predicted one. This line of enquiry 
has been extended to select algorithms for SMT solvers—a recent example is 
MachSMT by Scott et al. [28]. The machine learning models in MachSMT are 
trained by considering all the portfolio members in pairs for each problem in the 
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training set. This method is called pairwise ranking, which contrasts from our 
method, called list-wise ranking, in which we consider the full list of portfolio 
members all together. 

In terms of the machine learning task, the work on scheduling solvers bears 
greater similarity to our presented work. In MedleySolver, for example, Pim- 
palkhare et al. [20] frame this task as a multi-armed bandit problem. They 
predict a sequence of solvers as well as the time allocation for each to generate 
schedules for the goal problems. MedleySolver is able to solve more problems 
than any individual solver would on its own. 

With an approach that contrasts with ours, Hula et al. [10] have made use of 
Graph Neural Networks (GNNs) for solver scheduling. They produce a regression 
model to predict, for the given problem, the runtime of all the solvers; which 
is used as the key to sort the solvers in increasing order of predicted runtime 
to build a schedule. This is an example of point-wise ranking. The authors use 
GNNs to automatically discover features for machine learning. They combine 
this feature extraction with training of the regression model. They achieve an 
increase in the number of problems solved as well as a reduction in the total 
proof time. Meanwhile, our use of manual feature engineering combined with 
statistical methods for selection and normalization has certain advantages. For 
one, we can analyse our features and derive a subjective interpretation of their 
efficacy. Additionally, our features effectively impart our domain knowledge onto 
the model. Such domain knowledge may not be available in the data itself. 
Manual feature engineering such as ours can be combined with automatic feature 
extraction to reap the benefits of both. 


12 Conclusions 


We have presented a method to specialize, for the given goal theorem, the 
sequence of strategies in the schedule used in each proof attempt. A Bayesian 
machine learning model is trained in this method using data generated by test- 
ing the prover of interest. When evaluated with the iLeanCoP prover using the 
TPTP library as a benchmark, our results show a significant reduction in the 
time taken to prove theorems. For theorems that are successfully proved, the 
average time saving is above 50%. The prediction time is on average low enough 
to have a negligible impact on the resources subtracted from the proof search 
itself. 

We also extend this method to optimize time allocations to each strategy. 
In this setting, our results show a notable increase in the number of theorems 
proved. 

This work shows, by example, that Bayesian machine learning models 
designed specifically to augment heuristics in theorem provers, with detailed 
consideration of the computational compromises required in this setting, can 
deliver substantial improvements. 
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A Implementation, Code and Data 
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Abstract. Anti-unification aims at computing generalizations for given 
terms, retaining their common structure and abstracting differences by 
variables. We study quantitative anti-unification where the notion of the 
common structure is relaxed into “proximal” up to the given degree with 
respect to the given fuzzy proximity relation. Proximal symbols may 
have different names and arities. We develop a generic set of rules for 
computing minimal complete sets of approximate generalizations and 
study their properties. Depending on the characterizations of proximities 
between symbols and the desired forms of solutions, these rules give rise 
to different versions of concrete algorithms. 


Keywords: Generalization - Anti-unification > Quantiative theories - 
Fuzzy proximity relations 


1 Introduction 


Generalization problems play an important role in various areas of mathematics, 
computer science, and artificial intelligence. Anti-unification [12,14] is a logic- 
based method for computing generalizations. Being originally used for induc- 
tive and analogical reasoning, some recent applications include recursion scheme 
detection in functional programs [4], programming by examples in domain- 
specific languages [13], learning bug-fixing from software code repositories [3, 15], 
automatic program repair [7], preventing bugs and misconfiguration in ser- 
vices [11], linguistic structure learning for chatbots [6], to name just a few. 

In most of the existing theories where anti-unification is studied, the back- 
ground knowledge is assumed to be precise. Therefore, those techniques are not 
suitable for reasoning with incomplete, imprecise information (which is very 
common in real-world communication), where the exact equality is replaced by 
its (quantitative) approximation. Fuzzy proximity and similarity relations are 
notable examples of such extensions. These kinds of quantitative theories have 
many useful applications, some most recent ones being related to artificial intelli- 
gence, program verification, probabilistic programming, or natural language pro- 
cessing. Many tasks arising in these areas require reasoning methods and compu- 
tational tools that deal with quantitative information. For instance, approximate 
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inductive reasoning, reasoning and programming by analogy, similarity detec- 
tion in programming language statements or in natural language texts could 
benefit from solving approximate generalization constraints, which is a theoreti- 
cally interesting and challenging task. Investigations in this direction have been 
started only recently. In [1], the authors proposed an anti-unification algorithm 
for fuzzy similarity (reflexive, symmetric, min-transitive) relations, where mis- 
matches are allowed not only in symbol names, but also in their arities (fully 
fuzzy signatures). The algorithm from [9] is designed for fuzzy proximity (i.e., 
reflexive and symmetric) relations with mismatches only in symbol names. 

In this paper, we study approximate anti-unification from a more gen- 
eral perspective. The considered relations are fuzzy proximity relations. Prox- 
imal symbols may have different names and arities. We consider four differ- 
ent variants of relating arguments between different proximal symbols: unre- 
stricted relations/functions, and correspondence (i.e. left- and right-total) rela- 
tions/functions. A generic set of rules for computing minimal complete sets of 
generalizations is introduced and its termination, soundness and completeness 
properties are proved. From these rules, we obtain concrete algorithms that 
deal with different kinds of argument relations. We also show how the existing 
approximate anti-unification algorithms and their generalizations fit into this 
framework. 


Organization: In Sect. 2 we introduce the notation and definitions. Section 3 is 
devoted to a technical notion of term set consistency and to an algorithm for 
computing elements of consistent sets of terms. It is used later in the main 
set of anti-unification rules, which are introduced and characterized in Sect. 4. 
The concrete algorithms obtained from those rules are also described in this 
section. In Sect. 5, we discuss complexity. Section 6 offers a high-level picture of 
the studied problems and concludes. 
An extended version of this work can be found in the technical report [8]. 


2 Preliminaries 


Proximity Relations. Given a set S, a mapping R from S x S to the real 
interval [0,1] is called a binary fuzzy relation on S. By fixing a number A, 0 < 
A < 1, we can define the crisp (i.e., two-valued) counterpart of R, named the 
A-cut of R, as Ry := {(51, 82) | R(si1, 52) > A}. A fuzzy relation R on a set 
S is called a proximity relation if it is reflexive (R(s,s) = 1 for all s € S) 
and symmetric (R(s1,52) = R(s2,s1) for all s1,s2 € S). A T-norm ^ is an 
associative, commutative, non-decreasing binary operation on [0,1] with 1 as 
the unit element. We take minimum in the role of T-norm. 


Terms and Substitutions. We consider a first-order alphabet consisting of a 
set of fixed arity function symbols F and a set of variables V, which includes a spe- 
cialsymbol _ (the anonymous variable). The set of named (i.e., non-anonymous) 
variables V\{_} is denoted by VN. When the set of variables is not explicitly 
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specified, we mean V. The set of terms T(F,V) over F and YV is defined in the 
standard way: t € T(F,V) iff t is defined by the grammar t := x | f(ti,...,tn), 
where x € V and f € Fis an n-ary symbol with n > 0. Terms over T(F, V) are 
defined similarly except that all variables are taken from V^. 

We denote arbitrary function symbols by f,g, h, constants by a, b, c, variables 
by z, y,z,v, and terms by s,t,r. The head of a term is defined as head(x) := x 
and head(f(t1,...,tn)) := f. For a term t, we denote with V(t) (resp. by V\(t)) 
the set of all variables (resp. all named variables) appearing in t. A term is called 
linear if no named variable occurs in it more than once. 

The deanonymization operation deanon replaces each occurrence of 
the anonymous variable in a term by a fresh variable. For instance, 
deanon(f(_,2,9(_))) = f(y’,x,9(y"))), where y’ and y” are fresh. Hence, 
deanon(t) € T(F,V) is unique up to variable renaming for all t € T(F,V). 
deanon(t) is linear iff t is linear. 

The notions of term depth, term size and a postition in a term are defined in 
the standard way, see, e.g. [2]. By t|, we denote the subterm of t at position p 
and by t[s], a term that is obtained from t by replacing the subterm at position 
p by the term s. 

A substitution is a mapping from V to T(F, VN) (i.e., without anonymous 
variables), which is the identity almost everywhere. We use the Greek letters 
a, Ù, p to denote substitutions, except for the identity substitution which is writ- 
ten as Id. We represent substitutions with the usual set notation. Application of 
a substitution ø to a term t, denoted by to, is defined as _o := _, xo := o(x), 
f(ti,.--,tn)o := f(tio,..., tno). Substitution composition is defined as a com- 
position of mappings. We write ov for the composition of ø with v. 


Argument Relations and Mappings. Given two sets N = {1,...,n} and 
M = {1,..., m}, a binary argument relation over N x M is a (possibly empty) 
subset of N x M. We denote argument relations by p. An argument relation 
p E N x M is (i) left-total if for all i € N there exists j € M such that (i, j) € p; 
(ii) right-total if for all j € M there exists i € N such that (i,j) € p. Corres- 
pondence relations are those that are both left- and right-total. 

An argument mapping is an argument relation that is a partial injective 
function. In other words, an argument mapping x from N = {1,...,n} to M = 
{1,...,m} is a function n : In + Im, where In S N, Im S M and |I,| = |In]. 
Note that it can be also the empty mapping: x : Ø ++ Ø. The inverse of an 
argument mapping is again an argument mapping. 

Given a proximity relation R over F, we assume that for each pair of function 
symbols f and g with R(f,g) =a > 0, where f is n-ary and g is m-ary, there is 
also given an argument relation p over {1,...,n} x {1,...,m}. We use the nota- 
tion f ~ka g. These argument relations should satisfy the following conditions: 
p is the empty relation if f or g is a constant; p is the identity if f = g; f ~R a g 


=i 
iff g ~h a f, where p7} is the inverse of p. 
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Example 1. Assume that we have four different versions of defining the notion of 
author (e.g., originated from four different knowledge bases) author) (first-name, 
middle-initial, last-name), author2(first-name, last-name), author3(last-name, 
first-name, middle-initial), and author4(full-name). One could define the argu- 
ment relations/mappings between these function symbols e.g., as follows: 


{(1,1),(3,2)} {(3,1),(1,2),(2,3)} 


author, ~R oy authorz, author, ~oy authors, 
author, ae author,, author2 ae authors, 
authors A author4, authors E author4. 


Proximity Relations over Terms. Each proximity relation R in this paper is 
defined on Fu Y such that R(f, x) = 0 for all f € F and ve V, and R(x, y) = 0 
for all x # y, x,y € V. We assume that R is strict: for all w1, w2 € Fu V, if 
R(wi, we) = 1, then wı = w2. Yet another assumption is that for each f € F, 
its (R, A)-proximity class {g | R(f,g) > A} is finite for any R and A. 

We extend such an R to terms from T(F, V) as follows: 


(a) R(t, s) := 0 if R(head(s), head(t)) = 0; 

(b) R(t,s):= 1 if t = s and t,s E€ V; 

(c) Rit, s) ee R(f,9) ^ R(t 851) DIS Rtins Six) ift = I his: Seta) o> 
g(S1,- à eras f ~R a g, and p= 4 (tr, Hs : vig bie) y 


If R(t,s) > A, we write t ~r à s. When A = 1, the relation ~z,, does not 
depend on R due to strictness of the latter and is just the syntactic equality =. 
The (R, A)-proximity class of a term t is pep y(t) := {s | 8 ~R,, t}. 


Generalizations. Given R and å, a term r is an (R, A)-generalization of (alter- 
natively, (R, A)-more general than) a term t, written as r Xz, t, if there exists 
a substitution ø such that deanon(r)o ~p,, deanon(t). The strict part of Xp, 
is denoted by KR, À; i.e., r SRN tif r ÍR, t and not t ÍR, r. 

Example 2. Given a proximity relation R, a cut value A, constants a ~J a 
and b ~S aa c, binary function symbols f and h, and a unary function symbol g 
such that h ~£00-2)} f and h ~AD? g with a; > A, 1 < i< 4, we have 


— h(x, _) Sr, hla, x), because h(x, 2'){x a,x’ x} = h(a, £) ~R, h(a, x). 
) Zra h(_,x), because h(z,x' {x£ = y, x m x} = h(y', £) =R 


s) T h(_,2), because h(x, x) BRN h(y’, 2). 
xv, _) Zr, f(a,c), because h(x, x’){x > b} = h(b, 2’) ~r, f(a,c). 
x, _) Zra g(c), because h(x, x’){x + ch = h(c, 2’) =r g(c). 


The notion of syntactic generalization of a term is a special case of (R, A)- 
generalization for A = 1. We write r X t to indicate that r is a syntactic gener- 
alization of t. Its strict part is denoted by ~. 

Since R is strict, r X t is equivalent to deanon(r)a = deanon(t) for some o 
(note the syntactic equality here). 
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Theorem 1. [fr <t and t Xz, s, then r SR $. 


Proof. r X t implies deanon(r)o = deanon(t) for some o, while from t Xr, s we 
have deanon(t)) ~r,\ deanon(s) for some J. Then deanon(r)ov ~r, deanon(s), 
which implies r Xz,) s. 


Note that r <z,, t and t Xr, $, in general, do not imply r Xz,, s due to 
non-transitivity of ~R,- 


Definition 1 (Minimal complete set of (R,A)-generalizations). Given 
R, A, tı, and tz, a set of terms T is a complete set of (R, A)-generalizations of 
ty and tə if 


(a) every r eT is an (R, A)-generalization of tı and ta, 
(b) if r' is an (R,A)-generalization of tı and tz, then there exists r e T such 
that r' X r (note that we use syntactic generalization here). 


In addition, T is minimal, if it satisfies the following property: 
(c) ifr,’ eT, r #1, then neither r <r, T nor r <r AT. 


A minimal complete set of (R, A)-generalizations ((R,A)-mcsg) of two terms is 
unique modulo variable renaming. The elements of the (R, A)-mcsg of tı and tz 
are called least general (R, A)-generalizations ((R, X)-lggs) of tı and to. 

This definition directly extends to generalizations of finitely many terms. 


The problem of computing an (R, A)-generalization of terms t and s is called 
the (R, A)-anti-unification problem of t and s. In anti-unification, the goal is to 
compute their least general (R, A)-generalization. 

The precise formulation of the anti-unification problem would be the follow- 
ing: Given R, A, tı, t2, find an (R, A)-lgg r of tı and te, substitutions 01, o2, and 
the approximation degrees a1, @2 such that R(ro1, t1) = a; and R(roz, te) = az. 
A minimal complete algorithm to solve this problem would compute exactly the 
elements of (R, A)-mesg of tı and tz together with their approximation degrees. 
However, as we see below, it is problematic to solve the problem in this form. 
Therefore, we will consider a slightly modified variant, taking into account anony- 
mous variables in generalizations and relaxing bounds on their degrees. 

We assume that the terms to be generalized are ground. It is not a restriction 
because we can treat variables as constants that are close only to themselves. 

Recall that the proximity class of any alphabet symbol is finite. Also, the 
symbols are related to each other by finitely many argument relations. One may 
think that it leads to finite proximity classes of terms, but this is not the case. 
Consider, e.g., R and A, where h xi} f with binary h and unary f. Then the 
(R, X)-proximity class of f(a) is infinite: {f(a)} U {h(a,t) | t € T(F,V)}. Also, 
the (R, A)-mesg for f(a) and f(b) is infinite: {f(a)} o {h(a,t) |te T(F,@)}. 


Definition 2. Given the terms t,,...,tn, n > 1, a position p in a term r is 
called irrelevant for (R, A)-generalizing (resp. for (R, )-proximity to) t1,...,tn 
if r[S]p Sr,a ti (resp. r[s]p ~R, ti) for all 1 <i < n and for all terms s. 
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We say that r is a relevant (R, A)-generalization (resp. relevant (R, A)-pro- 
ximal term) of t1,...,tn ifr Sr. ti (resp. r Rà ti) for all 1 <i < n and 
rlp = _ for all positions p in r that is irrelevant for generalizing (resp. for 
proximity to) t1,...,tn. The (R, A)-relevant proximity class of t is 


rper y(t) := {s | s is a relevant (R, A)-proximal term of t}. 


In the example above, position 2 in h(x, t) is irrelevant for generalizing f(a) 
and f(b), and h(x, _) is one of their relevant generalizations. Note that f(x) 
is also a relevant generalization of f(a) and f(b), since it contains no irrelevant 
positions. More general generalizations like, e.g., x, are relevant as well. Similarly, 
position 2 in h(a, t) is irrelevant for proximity to f(a) and rper \(f(a)) = {f (a), 
h(a, _)}. Generally, rper y(t) is finite for any t due to the finiteness of proximity 
classes of symbols and argument relations mentioned above. 


Definition 3 (Minimal complete set of relevant (R, A)-generalizations). 
Given R, A, tı, and t2, a set of terms T is a complete set of relevant (R, A)- 
generalizations of tı and tə if 


(a) every element of T is a relevant (R, A)-generalization of tı and t2, and 
(b) ifr is a relevant (R, r)-generalization of tı and tz, then there exists r' € T 
such that r <r’. 


The minimality property is defined as in Definition 1. 


This definition directly extends to relevant generalizations of finitely many terms. 
We use (7, )-mcsrg as an abbreviation for minimal complete set of relevant 
(R, A)-generalization. Like relevant proximity classes, mcsrg’s are also finite. 


Lemma 1. For given R and à, if all argument relations are correspondence 
relations, then (R, A)-mcsg’s and (R, r)-proximity classes for all terms are finite. 


Proof. Under correspondence relations no term contains an irrelevant position 
for generalization or for proximity. 


Hence, for correspondence relations the notions of mesg and mesrg coincide, 
as well as the notions of proximity class and relevant proximity class. 

For a term r, we define its linearized version lin(r) as a term obtained 
from r by replacing each occurrence of a named variable in r by a fresh one. 
For instance, lin( f(x, _,g(y,x,a),b)) = f(a’, _,g(y’,2”,a),b), where 2’, 2”, y’ 
are fresh variables. Linearized versions of terms are unique modulo variable 
renaming. 


Definition 4 (Generalization degree upper bound). Given two terms r 
and t, a proximity relation R, and a d-cut, the (R, X)-generalization degree 
upper bound of r and t, denoted by gdubp y(r,t), is defined as follows: 

Let a := max{Ri(lin(r)o,t) | o is a substitution}. Then gdubg y(r,t) is a if 
a > A, and 0 otherwise. 
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Intuitively, gdubg (r,t) = a means that no instance of r can get closer than 
a to t in R. From the definition it follows that if r <r, t, thn O0 < à <S 
gdubp y(r,t) < 1 and if r Zr, t, then gdubg (r,t) = 0. 

The upper bound computed by gdub is more relaxed than it would be if the 
linearization function were not used, but this is what we will be able to compute 
in our algorithms later. 

Example 3. Let R(a,b) = 0.6, R(b, c) = 0.7, and À = 0.5. Then gdubg y (f(z, b), 
f(a, c)) = 0.7 and gdubg à (f(x, x), f(a,c)) = gdubg (f(x,y), f(a, ¢)) = 1. 

It is not difficult to see that if ra ~r, t, then R(ro,t) < gdubg (r,t). In 
Example 3, for o = {x +> b} we have R(f(x,x)o, f(a,c)) = R(f (b,b), f(a,c)) = 
0.6 < gdubg a (f(x, x), f(a,c)) = 1. 

We compute gdubg , (r,t) as follows: If r is a variable, then gdubg (r,t) = 1. 
Otherwise, if head(r) ~h g head(t), then gdubg (r,t) = B A AG, jep gdubp ; (rli, 
t|;). Otherwise, gdubg (r,t) = 0. 


3 Term Set Consistency 


The notion of term set consistency plays an important role in the computation 
of proximal generalizations. Intuitively, a set of terms is (R, A)-consistent if all 
the terms in the set have a common (R, A)-proximal term. In this section, we 
discuss this notion and the corresponding algorithms. 


Definition 5 (Consistent set of terms). A finite set of terms T is (R,A)- 
consistent if there exists a term s such that s ~r à t for allte T. 


(R, A)-consistency of a finite term set T is equivalent to (ler PER, (t) 4 Ø, 
but we cannot use this property to decide consistency, since proximity classes of 
terms can be infinite (when the argument relations are not restricted). For this 
reason, we introduce the operation m on terms as follows: (i) tn _ = _ nt=t, 
(ii) f(ti,...,tn) Of (s1,.--, $n) = f(t N $1,...,tn $n), n > 0. Obviously, m is 
associative (A), commutative (C), idempotent (I), and has _ as its unit element 
(U). It can be extended to sets of terms: Tı n To := {t1 M tə | tı € Ty, t2 € To}. It 
is easy to see that m on sets also satisfies the ACIU properties with the set {_ } 
playing the role of the unit element. 


Lemma 2. A finite set of terms T is (R, A)-consistent iff [ ler rper, a(t) # Ø. 
Proof. (=) If s ~r,, t for all t € T, then s; € rpc y(t), where s¢ is obtained 
from s by replacing all subterms that are irrelevant for its (R, A)-proximity to t 
by _. Assume T = {t1,...,tn}. Then s;,---7 st, €[ her PER, a(t). 
(<=) Obvious, since s ~z, t for s € [| lier rpcr,a(t) and for all t € T. 


Now we design an algorithm € that computes [ er rper, a(t) without actu- 
ally computing rper, (t) for each t€ T. A special version of the algorithm can 
be used to decide the (R, A)-consistency of T. 

The algorithm is rule-based. The rules work on states, that are pairs I; s, 
where s is a term and I is a finite set of expressions of the form xz in T, where 
T is a finite set of terms. R and X are given. There are two rules (w stands for 
disjoint union): 
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Rem: Removing the empty set 
{cin Ø} wI; s => I; s{zr > _}. 


Red: Reduce a set to new sets 
{x ny os tm}} © I; s => {y1 in T),...,yn in Tr} U E; s{x > hyi... Yn) h}, 
where m > 1, h is an n-ary function symbol such that h ~R, ds with 
v2 Afterall iS k < m, and T; := {tel; | (i,j) € Pk, L Sk <m}, l<i<n, 
is the set of all those arguments of the terms t1,...,tm tliat are a to be 
(R, )-proximal to the i’s argument of h. 


To compute [],-7 Per, a(t), € starts with {x in T}; x and applies the rules 
as long as possible. Red causes branching. A state of the form Ø; s is called 
a success state. A failure state has the form I; s, to which no rule applies and 
I # Ø. In the full derivation tree, each leaf is a either success or a failure state. 


Example 4. Assume a,b,c are constants, g, f,h are function symbols with the 
arities respectively 1, 2, and 3. Let À e given and R be defined so that R(a, b) > 


A, R) SA, h~ (a), AY #7 LOD} 9 with B > and y > A. Then 


rper (f(a, c)) a {f (a,c), f(b, c), f(a, b), f(b, b), h(b, = me 
rpcr_y(9(a)) = {g(a , g(b), h( a, ), hi ,b, Ne 


and rpcr )(f(a,c))nrper y(g(a)) = {h(b,a,_),h(b,b,_)}. We show how to 
compute this set with ©: {xin{f(a,c),g(a)}}; £ = >rea {yrin {a,c}, yo 
{a},ys in Ø}; h(y1,y2,y3) =—Rem {yi in {a,c}, yo: {at}; h(yi,yo,_) —>Red 
{y2 in {a}};h(b,y2,__). Here we have two ways to apply Red to the last 
state, leading to two elements of rper (f(a, c)) n rper \(g(a)): h(b,a, _) and 
h(b, b, _). 


Theorem 2. Given a finite set of terms T, the algorithm € always terminates 
starting from the state {x in T};x (where x is a fresh variable). If S is the set 
of success states produced at the end, we have {s | Ø; s € S} = [ |erprper,y(t). 


Proof. Termination: Associate to each state {x1 in T4, ... £n in Tn}; s the multi- 
set {d1,... , dn}, where d; is the maximum depth of terms occurring in T;. d; = 0 
if T; = @. Compare these multisets by the Dershowitz-Manna ordering [5]. Each 
rule strictly reduces them, which implies termination. 

By the definitions of rpeg a and n, A(s1,.--,8n) € [ leery... tm} PPCR, a(t) iff 
h~R,, head(t,) with yx > À — all 1 < k < m and s; €[ lyer, ‘Dee, x(t), where 
T; = {tel | (ij) E€ pk, 1 < k < m}, 1 <i < n. Therefore, in the Rem rule, 
the instance of x (which is A ++5Yn)) is in Thee us tm} FPCp,a(t) iff for each 
1 <i < n we can find an instance of y; in [ her, rper y(t ). If T; is empty, it 
means that the i’s argument of h is irrelevant for terms in {t1, . . . , tm } and can be 
replaced by _. (Rem does it in a subsequent step.) Hence, in each success branch 
of the derivation tree, the algorithm € computes one element of | ler rper_(t)- 
Branching at Red helps produce all elements of [ ler rPer a(t). 
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It is easy to see how to use € to decide the (R, A)-consistency of T: it is 
enough to find one successful branch in the €-derivation tree for {x in T}; x. 
If there is no such branch, then T is not (R, A)-consistent. In fact, during the 
derivation we can even ignore the second component of the states. 


4 Solving Generalization Problems 


Now we can reformulate the anti-unification problem that will be solved in the 
remaining part of the paper. R is a proximity relation and A is a cut value. 


Given: R, à, and the ground terms t),...,tn, n È 2. 
Find: a set S of tuples (r,01,...,0n,01,---,;Qn) such that 


— {r|(r,...) € S} is an (R, A)-mesrg of t1,..., tn, 
-= ro; =r ti and a = gdube (r,t), 1 < i < n, for each 
(T, 01,-+-,0n;Q1,--+;An) ES. 


(When n = 1, this is a problem of computing a relevant proximity class of 
a term.) Below we give a set of rules, from which one can obtain algorithms to 
solve the anti-unification problem for four versions of argument relations: 


1. The most general (unrestricted) case; see algorithm 21, below, the computed 
set of generalizations is an mcsrg; 

2. Correspondence relations: using the same algorithm 2, the computed set of 
generalizations is an mcsg; 

3. Mappings: using a dedicated algorithm 22, the computed set of generaliza- 
tions is an mesrg; 

4. Correspondence mappings (bijections): using the same algorithm Az, the com- 
puted set of generalizations is an mesg. 


Each of them has also the corresponding linear variant, computing minimal 
complete sets of (relevant) linear (R, X)-generalizations. They are denoted by 
adding the superscript lin to the corresponding algorithm name: A and 24". 

For simplicity, we formulate the algorithms for the case n = 2. They can be 
extended for arbitrary n straightforwardly. 

The main data structure in these algorithms is an anti-unification triple 
(AUT) x: Ti * To, where T; and Ts are finite consistent sets of ground terms. 
The idea is that x is a common generalization of all terms in Ti U Tz. A config- 
uration is a tuple A; S; r; a1;&2, where A is a set of AUTs to be solved, S is a 
set of solved AUTs (the store), r is the generalization computed so far, and the 
a’s are the current approximations of generalization degree upper bounds of r 
for the input terms. 

Before formulating the rules, we discuss one peculiarity of approximate gen- 
eralizations: 


Example 5. For a given R and à, assume R(a, b) > A, R(b,c) > A, h ae 
f and h we g, where f is binary, g,h are unary, a > A and B = à. Then 
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— h(b) is an (R, A)-generalization of f(a,c) and g(a). 

— x is the only (R, )-generalization of f(a,d) and g(a). One may be tempted 
to have h as the head of the generalization, e.g., h(a), but x cannot be instan- 
tiated by any term that would be (R, A)-close to both a and d, since in the 
given R, dis (R, A)-close only to itself. Hence, there would be no instance of 
h(a) that is (R, A)-close to f(a, d). Since there is no other alternative (except 
h) for the common neighbor of f and g, the generalization should be a fresh 
variable «x. 


This example shows that generalization algorithms should take into account not 
only the heads of the terms to be generalized, but also should look deeper, to 
make sure that the arguments grouped together by the given argument relation 
have a common neighbor. This justifies the requirement of consistency of a set 
of arguments, the notion introduced in the previous section and used in the 
decomposition rule below. 


4.1 Anti-unification for Unrestricted Argument Relations 


Algorithms glin and 2l, use the rules below to transform configurations into 
configurations. Given R, À, and the ground terms tı and t2, we create the initial 
configuration {x : {t1} = {t2}}; Ø; x; 1; 1 and apply the rules as long as possible. 
Note that the rules preserve consistency of AUTs. The process generates a finite 
complete tree of derivations, whose terminal nodes have configurations with the 
first component empty. We will show how from these terminal configurations one 
collects the result as required in the anti-unification problem statement. 


Tri: Trivial 
{x : Ø S D} w A; S; r; a1; a2 => A; S; r{x =} Q1; A2. 


Dec: Decomposition 

{a:T, = To} w A; S; r; a1; a2 => 
{yi: Qu S Qi | 1<i< n} u A; S; r{x = hly... Yn) J; a1 A B102 A Bo, 

where Ti U To # Ø; his n-ary with n > 0; y1,.--,Yn are fresh; and for j = 1,2, 
if T; = {ti th, }, then 
-h m head(t}) with y? > A for all 1 < k < mj and 8; = yÍ a--- a Vn, 

(note that B; = 1 if mj = 0), 
- for alll <i<n, Qi = Ue {tla | (íq) € 02} and is (R, \)-consistent. 


Sol: Solving 
{a:T, = T} w A; S; r; a1;09 = > A; {a : Ti = To} O S; r; a5 a2, 
if Tri and Dec rules are not applicable. (It means that at least one T; # Ø and 
either there is no A as it is required in the Dec rule, or at least one Qij from Dec 
is not (R, A)-consistent.) 
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Let expand be an expansion operation defined for sets of AUTs as 


expand(S) := {x : [| rper y(t) = | | rper y(t) |v: Ti = Tp € S}. 


teTi teT2 


Exhaustive application of the three rules above leads to configurations of the 
form Ø; S; r; a1; a2, where r is a linear term. These configurations are further 
postprocessed, replacing S by expand(S). We will use the letter E for expanded 
stores. Hence, terminal configurations obtained after the exhaustive rule appli- 
cation and expansion have the form Ø; E;r;01;02, where r is a linear term.! 
This is what Algorithm A!” stops with. 

To an expanded store E = {y1 : Q11 = Qia,---, Yn : Qni = Qn2} we associate 
two sets of substitutions Xz (E) and 2’r(£), defined as follows: o € X’,(£) (resp. 
o E€ XR(E)) iff dom(c) = {y1,..., Yn} and yo € Qi (resp. yio € Qi2) for each 
1<is<n. We call them the sets of witness substitutions. 

Configurations containing expanded stores are called expanded configurations. 
From each expanded configuration C = @;E;7r;01;02, we construct the set 
S(C) = {(r, 01, 02, 01, A2) | OLE XL(E), 02 E XR(E)}. 

Given an anti-unification problem R, A, tı and t2, the answer computed by 
Algorithm A is the set S := U'%,S(C;), where C1, ...,Cm are all of the final 
expanded configurations reached by ylin for R, A, tı, and tg.” 


Example 6. Assume a,b,c and d are constants with b ~ og C ~ og 4, and Ô, 
g and h are respectively binary, ternary and quaternary function symbols with 
h alee f and h ae eee g. For the proximity relation R given in 


this way and à = 0.5, Algorithm ylin performs the following steps to anti-unify 
f(a, b) and g(a, c, d): 


{z : {f (a, b)} z {g(a, C, d)}}; Ø; T; l; 1 => Dec 
{zı : {a} = {a}, z2: Ø S Ø, T3 : {b} = {d}, 
ua: {b} = Øy; Ø; h(a1, £2, v3, £4); 0.7; 0.8 —>Dec 
{x2 : Ø = Ø, z3 : {b} = {d}, vq: {b} = Ø}; Ø; h(a, £2, £3, £4); 0.7; 0.8 =— Tri 
{x3 : {b} = {d}, xa: {b} = Ø}; Ø; h(a, _, £3, £4); 0.7; 0.8 =— Dec 
{z4 : {b} = Ø}; D; h(a, _,c, x4); 0.5; 0.6. 

Here Dec applies in two different ways, with the substitutions {x4 > b} 
and {z4 |> c}, leading to two final configurations: Ø; Ø; h(a, _,c, b);0.5;0.6 and 
@;@;h(a,_,c,c);0.5;0.6. The witness substitutions are the identity substitu- 
tions. We have R(h(a,_,c,b), f(a,b)) = 0.5, R(h(a,_,c¢,b), g(a,c,d)) = 0.6, 
R(h(a, _,c,c), f(a, b)) = 0.5, and R(A(a, _,c, c), g(a, c, d)) = 0.6. 

If we had A ~AG YOPE) F then the algorithm would perform only the 
Sol step, because in the attempt to apply Dec to the initial configuration, the set 


1 Note that no side of the AUTs in E in those configurations is empty due to the 
condition at the Decomposition rule requiring the Q;;’s to be (R, A)-consistent. 

2 If we are interested only in linear generalizations without witness substitutions, there 
is no need in computing expanded configurations in At, 


A Framework for Approximate Generalization in Quantitative Theories 589 


Qi: = {a,b} is inconsistent: rper (a) = {a}, rper a(b) = {b,c}, and, hence, 
rpcr (a) my rper a(b) = ©. 

Algorithm 2 is obtained by further transforming the expanded configura- 
tions produced by ylin, This transformation is performed by applying the Merge 


rule below as long as possible. Intuitively, its purpose is to make the linear gen- 
eralization obtained by ylin less general by merging some variables. 


Mer: Merge 
Ø; {x1 : Rii * Rie, £2 : Roy * Roo} © E; r; a1; a2 => 
Ø; {y : Qi * Q2} U E; ro; a1; a2, 
where Q; = (Ru n Rai) # Ø, i = 1,2, y is fresh, and o = {z1 |> y, ra y}. 


The answer computed by %4 is defined similarly to the answer computed by yir, 


Example 7. Assume a,b are constants, fı, f2, gi, and gz are unary function 
symbols, p is a binary function symbol, and hı and hg are ternary function 
symbols. Let A be a cut value and R be defined as f; a h; and gi Ae hi 
with a; > A, B; > A, i = 1,2. To generalize p(fi(a), gı(b)) and p(f2(a), g2(b)), 
we use 2). The derivation starts as 


{x : {p(fi(a@), g1(b))} = {p(f2(a), 92(b)) Fis Ø; x; 1; 1 = pec 
{yr : {fila)} + {fo(a)}, y2 : {91(b)} = {92(b)}}; Ø; p(y, ye); 1; 1 a1 
Ø; {yi : {hila)} = {fola)}, yo: tgi(b)} = {92(b) bi p(y, Y2); 1; 1. 


At this stage, we expand the store, obtaining 


Ø; {yı : { fila), hi(a, = a) = { fe(a), ho(a, =) he 
Y2: {gi (b), Ai(_, b, op = {92(b), hə(_, b, H; P(Y1; y2); 1; 1. 
If we had the standard intersection ^ in the Mer rule, we would not be able to 
merge yı and y2, because the obtained sets in the corresponding AUTs are dis- 


joint. However, Mer uses m: we have {fi(a),hi(a,_,_)} n {g:(b), hi(_,b,_)} = 
{h;(a,b, _)}, i = 1,2 and, therefore, can make the step 


Ø; {y1 : {fı(a), hi(a, aia) = { f2(a), hala, _, mie 
Y2 : {gi(b), hi(_,b, _)} = {g2(b), ho(_,b, _)}s p(y1, yo); 1; 1 mer 
Ø; {z: {hi(a,b,_)} * {ho(a,b, _)}}; p(z, 2); 1; 1. 


Indeed, if we take the witness substitutions g; = {z > h;(a,b, _)}, i = 1,2, and 
apply them to the obtained generalization, we get 


p(z, z)o1 = P(hi(a, b, _), hı(a, b, _)) >R,X P( fila), gi(b)), 
p(z, z)02 = P(ha(a, b, as h(a, b, _)) ~R,X p( f(a), g2(b)). 
Theorem 3. Given R, A, and the ground terms tı and tg, Algorithm A, ter- 


minates for {x : {ti} = {te}};@;a31,1 and computes an answer set S such 
that 
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1. the set {r | (r, 01,02, 01,02) E S} is an (R,A)-mesrg of tı and ta, 
2. for each (7,01, 02,01,02) E€ S we have R(roi,ti) < a; = gdube a(r, ti), i = 
1,2. 


Proof. Termination: Define the depth of an AUT z : {t1,...,tm} = {81,..., 
Sn} as the depth of the term f(g(t1,...,tm),h($1,---,8n)). The rules Tri, Dec, 
and Sol strictly reduce the multiset of depths of AUTs in the first component 
of the configurations. Mer strictly reduces the number of distinct variables in 
generalizations. Hence, these rules cannot be applied infinitely often and M 
terminates. 


In order to prove (1), we need to verify three properties: 


— Soundness: If (7,01, 02,01, 02) E S, then r is a relevant (R, A)-generalization 
of ty and to. 

— Completeness: If r’ is a relevant (R, \)-generalization of tı and t2, then there 
exists (1,01, 02,01, A2) E S such that r’ <r. 

— Minimality: If r and r’ belong to two tuples from S such that r # r’, then 
neither r <z,, r’ nor 1’ <p.) 1. 


Soundness: We show that each rule transforms an (R, )-generalization into an 
(R, X)-generalization. Since we start from a most general (R, )-generalization 
of tı and tz (a fresh variable x), at the end of the algorithm we will get an 
(R, X)-generalization of tı and t2. We also show that in this process all irrele- 
vant positions are abstracted by anonymous variables, to guarantee that each 
computed generalization is relevant. 


Dec: The computed A is (R, )-close to the head of each term in Ti U To. Qi;’s 
correspond to argument relations between h and those heads, and each Qj; is 
(R, r)-consistent, i.e., there exists a term that is (R, A)-close to each term in 
Qij. It implies that ro = A(y1,...,yn) (R,A)-generalizes all the terms from 
Tı U Tz. Note that at this stage, h(y1,..., Yn) might not yet be a relevant (R, A)- 
generalization of Tı and T>: if there exists an irrelevant position 1 < i < n for 
the (R, A)-generalization of T} and T>, then in the new configuration we will 
have an AUT y; : Ø = Ø. 


Tri: When Dec generates y : Ø = Ø, the Tri rule replaces y by _ in the computed 
generalization, making it relevant. 

Sol does not change generalizations. 

Mer merges AUTs whose terms have nonempty intersection of rpc’s. Hence, 
we can reuse the same variable in the corresponding positions in generalizations, 
i.e., Mer transforms a generalization computed so far into a less general one. 


Completeness: We prove a slightly more general statement. Given two finite 
consistent sets of ground terms T; and To, if r’ is a relevant (R, A)-generalization 
for all tı € T; and tz € To, then starting from {x : Ti = To}; Ø; z; 1; 1, Algorithm 
Aı computes a (1,01, 02, 01,2) such that r’ x r. 

We may assume w.lLo.g. that r’ is a relevant (R, A)-lgg. Due to the transitivity 
of X, completeness for such an r’ will imply it for all terms more general than r’. 
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We proceed by structural induction on r’. If r’ is a (named or anonymous) 


variable, the statement holds. Assume r’ = h(r},...,7/,), Ti = {ui,..-,Um}, 
and Ta = {w),...,w}. Then h is such that h ~% g, head(u;) for all 1 < i < m 


and h Ry head(w;) for all 1 < j < l. Moreover, each r; is a relevant (R, )- 
generalization of Qı = Ui {uila | (k, q) € pi} and Qr2 = U} {wjla | (k,q) € 
uj} and, hence, Qkı and Qk2 are (R, A)-consistent. Therefore, we can perform a 
step by Dec, choosing h(y1,..., Yg) as the generalization term and y; : Qu = Qiz 
as the new AUTs. By the induction hypothesis, for each 1 < i < n we can 
compute a relevant (R, A)-generalization r; for Q;, and Qiz such that ri < ri 

If r’ is linear, then the combination of the current Dec step with the deriva- 
tions that lead to those r;’s computes a tuple (r,...) € S, where r = h(ri,...,1n) 
and, hence, r’ < r. 

If r’ is non-linear, assume without loss of generality that all occurrences of a 
shared variable z appear as the direct arguments of h: z = rh = +++ = Th, for 
1< kı <- < kp <n. Since 7 is an lgg, Qk;ı and Qg,2 cannot be generalized by 
a non-variable term, thus, Tri and Dec are not applicable. Therefore, the AUTs 
Yi? Qkıı  Qkriı2 would be transformed by Sol. Since all pairs Qk;ı and Qk;2, 
1 < į < p, are generalized by the same variable, we have MQ rper a(t) # Ø, 
where Q; = U?_1Qxk.j, j = 1,2. Additionally, Theri) fkg are all occurrences of z 
in r’. Hence, the condition of Mer is satisfied and we can extend our derivation 
with p — 1-fold application of this rule, obtaining r = h(r1,..., rn) with z = 
Tk, =*** = Tk, implying r’ X r. 


Minimality: Alternative generalizations are obtained by branching in Dec or Mer. 
If the current generalization r is transformed by Dec into two generalizations 
rı and rg on two branches, then rı = hi(yi,---,;Ym) and re = ho(z1,..-,2n) 
for some h’s, and fresh y’s and z’s. It may happen that rı SR, r2 or vice 
versa (if hy and hz are (R, A)-close to each other), but neither rı <Rr,\ r2 nor 
r2 <r,» rı holds. Hence, the set of generalizations computed before applying 
Mer is minimal. Mer groups AUTs together maximally, and different groupings 
are not comparable. Therefore, variables in generalizations are merged so that 
distinct generalizations are not <~r,,-comparable. Hence, (1) is proven. 

As for (2), for i = 1,2, from the construction in Dec follows R(ro;, ti) < a. 
Mer does not change a;, thus, a; = gdubp \(r, ti) also holds, since the way how a; 
is computed corresponds exactly to the computation of gdubp \(7,ti): r ZR, ti 
and only the decomposition changes the degree during the computation. 


The corollary below is proved similarly to Theorem 3: 


Corollary 1. Given R, à, and the ground terms tı and t2, Algorithm glin ter- 
minates for {x : {tı} = {t2}}; Ø; x;1;1 and computes an answer set S such 
that 


1. the set {r | (7,01, 02,01, 2) E S} is a minimal complete set of relevant linear 
(R, A)-generalizations of tı and te, 

2. for each (r, 01, 02,01,02) E€ S we have R(roi,ti) < a; = gdubg a(r, ti), i = 
1,2. 
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4.2 Anti-unification with Correspondence Argument Relations 


Correspondence relations make sure that for a pair of proximal symbols, no 
argument is irrelevant for proximity. Left- and right-totality of those relations 
guarantee that each argument of a term is close to at least one argument of its 
proximal term and the inverse relation remains a correspondence relation. Con- 
sequently, in the Dec rule of U, the sets Qij never get empty. Therefore, the Tri 
rule becomes obsolete and no anonymous variable appears in generalizations. As 
a result, the (R, A)-mesrg and the (R, A)-mesg coincide, and the algorithm com- 
putes a solution from which we get an (R, A)-mesg for the given anti-unification 
problem. The linear version glin works analogously. 


4.3 Anti-unification with Argument Mappings 


When the argument relations are mappings, we are able to design a more con- 
structive method for computing generalizations and their degree bounds (Recall 
that our mappings are partial injective functions, which guarantees that their 
inverses are also mappings.) We denote this algorithm by A2. The configurations 
stay the same as in before, but the AUTs in A will contain only empty or single- 
ton sets of terms. In the store, we may still get (after the expansion) AUTs with 
term sets containing more than one element. Only the Dec rule differs from its 
previous counterpart, having a simpler condition: 


Dec: Decomposition 
{a:T, = To} w A; S; r; 01; 02 => 
{yi : Qu = Qi |1 <i <n} u A; S;r{x = hly., Yn) J]; d1 A Bi; a2 A Ba, 
where Ti U Th 4 @; h is n-ary with n > 0; y1,..., Yn are fresh; for j = 1,2 and 
for alll < į < n, if Tj = {t;} then h ~R Bs head(t;) and Qij = {tjlx,(a}, and if 
T} = Ø then Bj = ] and Qij = Ø. 


This Dec rule is equivalent to the special case of Dec for argument relations 
where m; < 1. The new Qi;;’s contain at most one element (due to mappings) 
and, thus, are always (R, \)-consistent. Various choices of h in Dec and alterna- 
tives in grouping AUTs in Mer cause branching in the same way as in U. It is 
easy to see that the counterparts of Theorem 3 hold for M and ylin as well. 

A special case of this fragment of anti-unification is anti-unification for sim- 
ilarity relations in fully fuzzy signatures from [1]. Similarity relations are min- 
transitive proximity relations. The position mappings in [1] can be modeled by 
our argument mappings, requiring them to be total for symbols of the smaller 
arity and to satisfy the similarity-specific consistency restrictions from [1]. 


4.4 Anti-unification with Correspondence Argument Mappings 


Correspondence argument mappings are bijections between arguments of func- 
tion symbols of the same arity. For such mappings, if h ~% , f and h is n-ary, 
then f is also n-ary and v is a permutation of (1,...,n). Hence, As combines 
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in this case the properties of 2; for correspondence relations (Sect. 4.2) and of 
Az for argument mappings (Sect. 4.3): all generalizations are relevant, computed 
answer gives an mcsg of the input terms, and the algorithm works with term 
sets of cardinality at most 1. 


5 Remarks About the Complexity 


The proximity relation R can be naturally represented as an undirected graph, 

where the vertices are function symbols and an edge between them indicates that 

they are proximal. Graphs induced by proximity relations are usually sparse. 

Therefore we can represent them by (sorted) adjacency lists. In the adjacency 

lists, we can also accommodate the argument relations and proximity degrees. 
In the rest of this section we use the following notation: 


— n: the size of the input (number of symbols) of the corresponding algorithms, 
— A: the maximum degree of R considered as a graph, 

— a: the maximum arity of function symbols that occur in R. 

— m°”: a function defined on natural numbers m and n such that 1°” = n and 


m°?” = m” for m # 1. 


We assume that the given anti-unification problem is represented as a com- 
pletely shared directed acyclic graph (dag). Each node of the dag has a pointer 
to the adjacency list (with respect to R) of the symbol in the node. 


Theorem 4. Time complexities of € and the linear versions of the generaliza- 
tion algorithms are as follows: 


- € for argument relations and glin : Oln- A- A”), 
- € for argument mappings and As": O(n- A- A*”), 


Proof (Sketch). In €, in the case of argument relations, an application of the Red 
rule to a state I; s replaces one element of I of size m by at most a new elements, 
each of them of size m — 1. Hence, one branch in the search tree for €, starting 
from a singleton set I of size n, will have the length at most l = D a’. At each 
node on it there are at most A choices of applying Red with different h’s, which 
gives the total size of the search tree to be at most Se A’, i.e., the number 
of steps performed by € in the worst case is O(A*””). Those different h’s are 
obtained by intersecting the proximity classes of the heads of terms {t,,...,tm} 
in the Red rule. In our graph representation of the proximity relation, proximity 
classes of symbols are exactly the adjacency lists of those symbols which we 
assume are sorted. Their maximal length is A. Hence, the work to be done at 
each node of the search tree of € is to find the intersection of at most n sorted 
lists, each containing at most A elements. It needs O(n - A) time. It gives the 
time complexity O(n- A- A**’”") of € for the relation case. 

In the mapping case, an application of the Red rule to a state I; s replaces 
one element of I of size m by at most a new elements of the total size m — 1. 
Therefore, the maximal length of a branch is n, the branching factor is A, and 
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the amount of work at each node, like above, is O(n- A). Hence, the number of 
steps in the worst case is O(A*”) and the time complexity of € is O(n- A- A°”). 

The fact that consistency check is incorporated in the Dec rule in ylin can be 
used to guide the application of this rule, using the values memoized by the pre- 
vious applications of Red. The very first time, the appropriate h in Dec is chosen 
arbitrarily. In any subsequent application of this rule, h is chosen according to 
the result of the Red rule that has already been applied to the arguments of the 
current AUT for their consistency check, as required by the condition of Dec. In 
this way, the applications of Dec and Sol will correspond to the applications of 
Red. There is a natural correspondence between the applications of Rem and Tri 
rules. Therefore, ylin will have the search tree analogous to that of €. Hence the 
complexity of 2" is O(n- A- A”). 25" does not call the consistency check, but 
does the same work as € and, hence, has the same complexity O(n - A- A®”). 


6 Discussion and Conclusion 


The diagram below illustrates the connections between different anti-unification 
problems based on argument relations: 


unrestricted relations correspondence relations 
unrestricted mappings correspondence mappings 


The arrows indicate the direction from more general problems to more spe- 
cific ones. For the unrestricted cases (left column) we compute mcsrg’s. For 
correspondence relations and correspondence mappings (right column), mcsg’s 
are computed. (In fact, for them, the notions of mcsrg and mcsg coincide). The 
algorithms for relations (upper row) are more involved than those for mappings 
(lower row): Those for relations deal with AUTs containing arbitrary sets of 
terms, while for mappings, those sets have cardinality at most one, thus sim- 
plifying the conditions in the rules. Moreover, the two cases in the lower row 
generalize the existing anti-unification problems: 


— the unrestricted mappings case generalizes the problem from [1] by extending 
similarity to proximity and relaxing the smaller-side-totality restriction; 

— the correspondence mappings case generalizes the problem from [9] by allow- 
ing permutations between arguments of proximal function symbols. 


All our algorithms can be easily turned into anti-unification algorithms for 
crisp tolerance relations? by taking lambda-cuts and ignoring the computation of 
the approximation degrees. Besides, they are modular and can be used to com- 
pute only linear generalizations by just skipping the merging rule. We provided 
complexity estimations for the algorithms that compute linear generalizations 
(that often are of practical interest). 


3 Tolerance: reflexive, symmetric, not necessarily transitive relation. According to 
Poincaré, a fundamental notion for mathematics applied to the physical world. 
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In this paper, we did not consider cases when the same pair of symbols is 
related to each other by more than one argument relation. Our results can be 
extended to them, that would open a way towards approximate anti-unification 
modulo background theories specified by shallow collapse-free axioms. Another 
interesting direction of future work would be extending our results to quantita- 
tive algebras [10] that also deal with quantitative extensions of equality. 
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Abstract. Automated theorem provers (ATPs) are today used to attack 
open problems in several areas of mathematics. An ongoing project by 
Kinyon and Veroff uses Prover9 to search for the proof of the Abelian 
Inner Mapping (AIM) Conjecture, one of the top open conjectures in 
quasigroup theory. In this work, we improve Prover9 on a benchmark 
of AIM problems by neural synthesis of useful alternative formulations 
of the goal. In particular, we design the 3SIL (stratified shortest solu- 
tion imitation learning) method. 3SIL trains a neural predictor through 
a reinforcement learning (RL) loop to propose correct rewrites of the 
conjecture that guide the search. 

3SIL is first developed on a simpler, Robinson arithmetic rewriting 
task for which the reward structure is similar to theorem proving. There 
we show that 3SIL outperforms other RL methods. Next we train 3SIL 
on the AIM benchmark and show that the final trained network, deciding 
what actions to take within the equational rewriting environment, proves 
70.2% of problems, outperforming Waldmeister (65.5%). When we com- 
bine the rewrites suggested by the network with Prover9, we prove 8.3% 
more theorems than Prover9 in the same time, bringing the performance 
of the combined system to 90%. 


Keywords: Automated theorem proving - Machine learning 


1 Introduction 


Machine learning (ML) has recently proven its worth in a number of fields, rang- 
ing from computer vision [17], to speech recognition [15], to playing games [28, 40] 
with reinforcement learning (RL) [45]. It is also increasingly applied in auto- 
mated and interactive theorem proving. Learned predictors have been used for 
premise selection [1] in hammers [6], to improve clause selection in saturation- 
based theorem provers [9], to synthesize functions in higher-order logic [12], and 
to guide connection-tableau provers [21] and interactive theorem provers [2, 5, 14]. 

Future growth of the knowledge base of mathematics and the complexity of 
mathematical proofs will increase the need for proof checking and its better com- 
puter support and automation. Simultaneously, the growing complexity of soft- 
ware will increase the need for formal verification to prevent failure modes [10]. 


© The Author(s) 2022 
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Automated theorem proving and mathematics will benefit from more advanced 
ML integration. One of the mathematical subfields that makes substantial use 
of automated theorem provers is the field of quasigroup and loop theory [32]. 


1.1 Contributions 


In this paper, we propose to use a neural network to suggest lemmas to the 
Prover9 [25] ATP system by rewriting parts of the conjecture (Sect. 2). We test 
our method on a dataset of theorems collected in the work on the Abelian Inner 
Mapping (AIM) Conjecture [24] in loop theory. For this, we use the AIMLEAP 
proof system |7] as a reinforcement learning environment. This setup is described 
in Sect. 3. For development we used a simpler Robinson arithmetic rewriting 
task (Sect.4). With the insights derived from this and a comparison with other 
methods, we describe our own 3SIL method in Sect. 5. We use a neural network to 
process the state of the proving attempt, for which the architecture is described 
in Sect. 6. The results on the Robinson arithmetic task are described in Sect. 7.1. 
We show our results on the AIMLEAP proving task, both using our predictor 
as a stand-alone prover and by suggesting lemmas to Prover9 in Sect. 7.2. Our 
contributions are: 


1. We propose a training method for reinforcement learning in theorem proving 
settings: stratified shortest solution imitation learning (3SIL). This method 
is suited to the structure of theorem proving tasks. This method and the 
reasoning behind it is explained in Sect. 5. 

2. We show that 3SIL outperforms other baseline RL methods on a simpler, 
Robinson arithmetic rewriting task for which the reward structure is similar 
to theorem proving (Sect. 7.1). 

3. We show that a standalone neurally guided prover trained by the 3SIL 
method outperforms the hand-engineered Waldmeister prover on the AIM- 
LEAP benchmark (Sect. 7.2). 

4. We show that using a neural rewriting step that suggests rephrased versions 
of the conjecture to be added as lemmas improves the ATP performance on 
equational problems (Sects. 2 and 7.2). 


2 ATP and Suggestion of Lemmas by Neural Rewriting 


Saturation-based ATPs make use of the given clause [30] algorithm, which we 
briefly explain as background. A problem is expressed as a conjunction of many 
initial clauses (i.e., the clausified axioms and the negated goal which is always an 
equation in the AIM dataset). The algorithm starts with all the initial clauses 
in the unprocessed set. We then pick a clause from this set to be the given 
clause and move it to the processed set and do all inferences with the clauses in 
the processed set. The newly inferred clauses are added to the unprocessed set. 
This concludes one iteration of the algorithm, after which we pick a new given 
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Phase 1 Phase 2 


Rewrite Conjecture & ATP Search With Added 
Collect Lemmas Lemmas 


Starting Goal 
Guidance Lemmas 
3=1+1+1 
RL Loop — 


Rewritten Goal 


3=1+2 ATP Search 
— y ATP Input 


RL Environment Collected Lemma 


1+1+1=1+2 


Fig. 1. Schematic representation of the proposed guidance method. In the first phase, 
we run a reinforcement learning loop to propose actions that rewrite a conjecture. This 
predictor is trained using the AIMLEAP proof environment. We collect the rewrites 
of the LHS and RHS of the conjecture. In the second phase, we add the rewrites to 
the ATP search input, to act as guidance. In this specific example, we only rewrote 
the conjecture for 1 step, but the added guidance lemmas are in reality the product of 
many steps in the RL loop. 


clause and repeat |23]. Typically, this approach is designed to be refutationally 
complete, i.e., the algorithm is guaranteed to eventually find a contradiction if 
the original goal follows from the axioms. 

This process can produce a lot of new clauses and the search space can 
become quite large. In this work, we modify the standard loop by adding useful 
lemmas to the initial clause set. These lemmas are proposed by a neural network 
that was trained from zero knowledge to rewrite the left- and right-hand sides of 
the initial goal to make them equal by using the axioms as the available rewrite 
actions. Even though the neural rewriting might not fully succeed, the rewrites 
produced by this process are likely to be useful as additional lemmas when added 
to the problem. This idea is schematically represented in Fig. 1. 


3 AIM Conjecture and the AIMLEAP RL Environment 


Automated theorem proving has been applied in the theory surrounding the 
Abelian Inner Mapping Conjecture, known as the AIM Conjecture. This is one 
of the top open conjectures in quasigroup theory. Work on the conjecture has 
been going on for more than a decade. Automated theorem provers use hundreds 
of thousands of inference steps when run on problems from this theory. 

As a testbed for our machine learning and prover guidance methods we use 
a previously published dataset of problems generated by the AIM conjecture [7]. 
The dataset comes with a simple prover called AIMLEAP that can take machine 
learning advice.! We use this system as an RL environment. AIMLEAP keeps the 
state and carries out the cursor movements (the cursor determines the location 
of the rewrite) and rewrites that a neural predictor chooses. 


1 https: //github.com/ai4reason /aimleap. 
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The AIM conjecture concerns specific structures in loop theory [24]. A loop 
is a quasigroup with an identity element. A quasigroup is a generalization of a 
group that does not preserve associativity. This manifests in the presence of two 
different ‘division’ operators, one left-division (\) and one right-division (/). We 
briefly explain the conjecture to show the nature of the data. 

For loops, three inner mapping functions (left-translation L, right-translation 
R, and the mapping T) are: 


L(u, x,y) := (y * x)\(y * (x x u)) T(u, x) := 2\(ux* x) 
R(u, x,y) = ((u * x) x y)/(x * y) 


These mappings can be seen as measures of the deviation from commutativity 
and associativity. The conjecture concerns the consequences of these three inner 
mapping functions forming an Abelian (commutative) group. There are two more 
notions, that of the associator function a and the commutator function K: 


a(x, y, z) = (x * (y * z))\((w * y) * 2) K(a,y) = (y * £)/(x * y) 


From these definitions, the conjecture can be stated. There are two parts to the 
conjecture. For both parts, the following equalities need to hold for all u, v, x, 
y, and z: 


ala(x,y,z),u,v) =1 a(x, a(y,2,u),v) =1 a(x, y,a(z,u,v)) = 1 


where 1 is the identity element. These are necessary, but not sufficient for the 
two main parts of the conjecture. The first part of the conjecture asks whether 
a loop modulo its center is a group. In this context, the center is the set of all 
elements that commute with all other elements. This is the case if 


K(a(a,y,z),u) =1. 


The second part of the conjecture asks whether a loop modulo its nucleus is an 
Abelian group. The nucleus is the set of elements that associate with all other 
elements. This is the case if 


a(K(a,y),2z,u) =1 a(x, K(y,z),u) =1 a(z,y, K(z,u)) =1 


3.1 The AIMLEAP RL Environment 


Currently, work in this area is done using automated theorem provers such as 
Prover9 [24,25]. This has led to some promising results, but the search space 
is enormous. The main strategy for proving the AIM conjecture thus far has 
been to prove weaker versions of the conjecture (using additional assumptions) 
and then import crucial proof steps into the stronger version of the proof. The 
Prover9 theorem prover is especially suited to this approach because of its well- 
established hints mechanism [48]. The AIMLEAP dataset is derived from this 
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Prover9 approach and contains around 3468 theorems that can be proven with 
the supplied definitions and lemmas [7]. 

There are 177 possible actions in the AIMLEAP environment [7]. We handle 
the proof state as a tree, with the root node being an equality node. Three 
actions are cursor movements, where the cursor can be moved to an argument 
of the current position. The other actions all rewrite the current term at the 
cursor position with various axioms, definitions and lemmas that hold in the 
AIM context. As an example, this is one of the theorems in the dataset (\ and 
= are part of the language): 


T(T(T (x, T(x, y)\1), Ta, y)\1), y) = T((T (x, y)\D\L, T(z, y)\1) - 


The task of the machine learning predictor is to process the proof state and 
recognize which actions are most likely to lead to a proof, meaning that the two 
sides of the starting equation are equal according to the AIMLEAP system. The 
only feedback that the environment gives is whether a proof has been found or 
not: there is no intermediate reward (i.e. rewards are sparse). The ramifications 
of this are further discussed in Sect. 5.1. 


4 Rewriting in Robinson Arithmetic as an RL Task 


To develop a machine learning method that can help solve equational theorem 
proving problems, we considered a simpler arithmetic task, which also has a tree- 
structured input and a sparse reward structure: the normalization of Robinson 
arithmetic expressions. The task is to normalize a mathematical expression to 
one specific form. This task has been implemented as a Python RL environment, 
which we make available.? The learning environment incorporates an existing 
dataset, constructed by Gauthier for RL experiments in the interactive theorem 
prover HOL4 [11]. Our RL setup for the task is also modeled after [11]. 

In more detail, the formalism that we use as an RL environment is Robinson 
arithmetic (RA). RA is a simple arithmetic theory. Its language contains the 
successor function S, addition + and multiplication * and one constant, the 0. 
The theory considers only non-negative numbers and we only use four axioms 
of RA. Numbers are represented by the constant 0 with the appropriate number 
of successor functions applied to it. The task for the agent is to rewrite an 
expression until there are only nodes of the successor or 0 types. Effectively, we 
are asking the agent to calculate the value of the expression. As an example, 
S(S(0)) + S(0), representing 2 + 1, needs to be rewritten to S(S(S(0))). 

The expressions are represented as a tree data structure. Within the environ- 
ment, there are seven different rewrite actions available to the agent. The four 
axioms (equations) defining these actions are x +0 = x, x + S(y) = S(x + y), 
x x0 = 0 and «x S(y) = (x * y) + 2, where the agent can apply the equations 
in either direction. There is one exception: the multiplication by 0 cannot be 
applied from right to left, as this would require the agent to introduce a fresh 


? https: //github.com/learningeqtp/rewriteRL. 
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term which is out of scope for the current work. The place where the rewrite is 
applied is denoted by the location of the cursor in the expression tree. 

In addition to the seven rewrite actions, the agent can move the cursor to 
one of the children of the current cursor node. This gives a total number of nine 
actions. Moving to a child of a node with only one child counts as moving to 
the left child. After a rewriting action, the cursor is reset to the root of the 
expression. More details on the actions are in the RewriteRL repository. 


5 Reinforcement Learning Methods 


This section describes the reinforcement learning methods, while Sect.6 then 
further explains the particular neural architectures that are trained in the RL 
loops. We first briefly explain here the approaches that we used as reinforcement 
learning (RL) baselines, then we go into detail about the proposed 3SIL method. 


5.1 Reinforcement Learning Baselines 


General RL Setup. For comparison, we used implementations of four estab- 
lished reinforcement learning baseline methods. In reinforcement learning, we 
consider an agent that is acting within an environment. The agent can take 
actions a from the action-space A to change the state s € S of the environment. 
The agent can be rewarded for certain actions taken in a certain states, with 
reward given by the reward function R : (S x A) — R. The behavior of the 
environment is given by the state transition function P : (S x A) — S. The 
history of the agent’s actions and the environments states and rewards at each 
timestep ¢ are collected in tuples (s;,a:,7;). For a given history of a certain 
agent within an environment, we call the list of tuples (s+, az, r+) describing this 
history an episode. The policy function m : S — A allows the agent to decide 
which action to take. The agent’s goal is to maximize the return R: the sum of 
discounted rewards }>,.) ‘rs, where y is a discount factor that allows control 
over how heavily rewards further in the future should be weighted. We will use 
R, when we mean R, but calculated only from rewards from timestep t on. In 
the end, we are thus looking for a policy function 7 that maximizes the sum R 
of (discounted) expected rewards [45]. 

In our setting, every proof attempt (in the AIM setting) or normalization 
attempt (in the Robinson arithmetic setting) corresponds to an episode. The 
reward structure of theorem proving is such that there is only a reward of 1 at 
the end of a successful episode (i.e. a proof was found in AIM). Unsuccessful 
episodes get a reward of 0 at every timestep t. 


A2C. The first method, Advantage Actor-Critic, or A2C [27] contains ideas on 
which the other three RL baseline methods build, so we will go into more detail 
for this method, while keeping the explanation for the other methods brief. For 
details we refer to the corresponding papers. 
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A2C attempts to find suitable parameters for an agent by minimizing a loss 

function consisting of two parts: 
A2C A2C 
L= es F Lie : 
In addition to the policy function 7, the agent has access to a value function 
V : S — R, that predicts the sum of future rewards obtained when given a state. 
In practice, both the policy and the value function are computed by a neural 
network predictor. The parameters of the predictor are set by stochastic gradient 
descent to minimize L. The set of parameters of the predictor that defines the 
policy function m is named 6, while the parameters that define the value function 
are named u. The first part of the loss is the policy loss, which for one time step 
has the form 
oor a log To (atls) A(s, at) , 

where A(s,a) is the advantage function. The advantage function can be formu- 
lated in multiple ways, but the simplest is as R; — V,,(s;). That is to say: the 
advantage of an action in a certain state is the difference between the discounted 
rewards R; after taking that action and the value estimate of the current state. 
Minimizing CoG amounts to maximizing the log probability of predicting 
actions that are judged by the advantage function to lead to high reward. 

The value estimates V,,(s) for computing the advantage function are supplied 
by the value predictor V, with parameters u, which is trained using the loss: 


LAC, = 5 (Re—Vylse))? 
which minimizes the advantage function. The logic of this is that the value 
estimate at timestep t, V,,(sz), will learn to incorporate the later rewards Rz, 
ensuring that when later seeing the same state, the possible future reward will 
be considered. Note that the sets of parameters 0 and p are not necessarily 
disjoint (see Sect. 6). 

Note how the above equations are affected if there is no non-zero reward r; 
obtained at any timestep. In that case, the value function V,,(s;) will estimate 
(correctly) that any state will get 0 reward, which means that the advantage 
function A(s,a) will also be 0 everywhere. This means that fees will be 0 
in most cases, which will lead to no or little change in the parameters of the 
predictor: learning will be very slow. This is the difficult aspect of the structure 
of theorem proving: there is only reward at the end of a successful proof, and 
nowhere else. This implies a possible strategy is to imitate successful episodes, 
without a value function. In this case, we would only need to train a policy 
function, and no approximate value function. This an aspect we explore in the 
design of our own method 3SIL, which we will explain shortly. 

Compared to two-player games, such as chess and go, for which many 
approaches have been tailored and successfully used [41], theorem-proving has 
the property that it is hard to collect useful examples to learn from, as only 
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successful proofs are likely to contain useful knowledge. In chess or go, however, 
one player almost always wins and the other loses, which means that we can at 
least learn from the difference between the two strategies used by those players. 
As an example, we executed 2 million random proof attempts on the AIMLEAP 
environment, which led to 300 proofs to learn from, whereas in a two-player 
setting like chess, we would get 2 million games in which one player would likely 
win. 


ACER. The second RL baseline method we tested in our experiments is ACER, 
Actor-Critic with Experience Replay [49]. This approach can make use of data 
from older episodes to train the current predictor. ACER applies corrections to 
the value estimates so that data from old episodes may be used to train the 
current policy. It also uses trust region policy optimization [35] to limit the size 
of the policy updates. This method is included as a baseline to check if using a 
larger replay buffer to update the parameters would be advantageous. 


PPO. Our third RL baseline is the widely used proximal policy optimization 
(PPO) algorithm [36]. It restricts the size of the parameter update to avoid 
causing a large difference between the original predictor’s behavior and the 
updated version’s behavior. The method is related to the above trust region 
policy optimization method. In this way, PPO addresses the training instability 
of many reinforcement learning approaches. It has been used in various settings, 
for example complex video games [4]. With its versatility, the PPO algorithm is 
well-positioned. We use the PPO algorithm with clipped objective, as in [36]. 


SIL-PAAC. Our final RL baseline uses only the transitions with positive advan- 
tage to train on for a portion of the training procedure, to learn more from good 
episodes. This was proposed as self-imitation learning (SIL) [29]. To avoid con- 
fusion with the method that we are proposing, we extend the acronym to SIL- 
PAAC, for positive advantage actor-critic. This algorithm outperformed A2C 
on the sparse-reward task Montezuma’s Revenge (a puzzle game). As theorem 
proving has a sparse reward structure, we included SIL-PAAC as a baseline. 
More information about the implementations for the baselines can be found in 
the Implementation Details section at the end of this work. 


5.2 Stratified Shortest Solution Imitation Learning 


We introduce stratified shortest solution imitation learning (3SIL) to tackle the 
equational theorem proving domain. It learns to explicitly imitate the actions 
taken during the shortest solutions found for each problem in the dataset. We do 
this by minimizing the cross-entropy —log p(dsotution|Sz) between the predictor 
output and the actions taken in the shortest solution. This is in contrast to the 
baseline methods, where value functions are used to judge the utility of decisions. 

In our procedure this is not the case. Instead, we build upon the assumption 
for data selection that shorter proofs are better in the context of theorem proving 
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Algorithm 1. CollectEpisode 
Input: problem p, policy me, problem history H 
Generate episode by following noisy version of mọ on p 
If solution, add list of tuples (s, a) to H[p] 
Keep k shortest solutions in H[p] 


Algorithm 2. 3SIL 
Input: set of problems P, randomly initialized policy me, batch size B, number of 
batches NB, problem history H, number of warmup episodes m, number of episodes 
f, max epochs ME 
Output: trained policy me, problem history H 
for e = 0 to ME — 1 do 
if e = 0 then num = m else num = f 
for i = 0 to num — 1 do 
CollectEpisode(sample(P), 7, H) (Algorithm 1) 
end for 
for i = 0 to NB—1 do 
Sample B tuples (s, a) with uniform probability for each problem from H 
Update 0 to lower — es log To (ab|sb) by gradient descent 
end for 
end for 


and expression normalization. In a sense, we value decisions from shorter proofs 
more and explicitly imitate those transitions. We keep a history H for each 
problem, where we store the current shortest solution (states seen and actions 
taken) found for that problem in the training dataset. We can also store multiple 
shortest solutions for each problem if there are multiple strategies for a proof 
(the number of solutions kept is governed by the parameter k). 

During training, in the case k = 1, we sample state-action pairs from each 
problem’s current shortest solution at an equal probability (if a solution was 
found). To be precise, we first randomly pick a theorem for which we have a 
solution, and then randomly sample one transition from the shortest encountered 
solution. This directly counters one of the phenomena that we had observed: the 
training examples for the baseline methods tend to be dominated by very long 
episodes (as they contribute more states and actions). This stratified sampling 
method ensures that problems with short proofs get represented equally in the 
training process. 

The 3SIL algorithm is described in more detail in Algorithm 2. Sampling from 
a noisy version of policy 79 means that actions are sampled from the predictor- 
defined distribution and in 5% of cases a random valid action is selected. This 
is also known as the «greedy policy (with € at 0.05). 


Related Methods. Our approach is similar to the imitation learning algorithm 
DAGGER (Dataset Aggregation), which was used for several games [34] and 
modified for branch-and-bound algorithms in [16]. The behavioral cloning (BC) 
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technique used in robotics [47] also shares some elements. 3SIL significantly 
differs from DAGGER and BC because it does not use an outside expert to 
obtain useful data, because of the stratified sampling procedure, and because of 
the selection of the shortest solutions for each problem in the training dataset. 
We include as an additional baseline an implementation of behavioral cloning 
(BC), where we regard proofs already encountered as coming from an expert. 
We minimize cross-entropy between the actions in proofs we have found and 
the predictions to train the predictor. For BC, there is no stratified sampling 
or shortest solution selection, only the minimization of cross-entropy between 
actions taken from recent successful solutions and the predictor’s output. 


Extensions. For the AIM tasks, we introduce two other techniques, biased 
sampling and episode pruning. In biased sampling, problems without a solution 
in the history are sampled 5 times more during episode collection than solved 
problems to accelerate progress. This was determined by testing 1, 2, 5 and 10 
as sampling proportions. For episode pruning, when the agent encountered the 
same state twice, we prune the episode to exclude the looping before storing the 
episode. This helps the predictor learn to avoid these loops. 


6 Neural Architectures 


The tree-structured states representing expressions occurring during the tasks 
will be processed by a neural network. The neural network takes the tree- 
structured state and predicts an action to take that will bring the expression 
closer to being normalized or the theorem closer to being proven. 


p(action | s) 
Processor Network 


Addition Layer 
SE 


Fig. 2. Schematic representation of the creation of a representation of an expression (an 
embedding) using different neural network layers to represent different operations. The 
figure depicts the creation of a numerical representation for the Robinson arithmetic 
expression (S(0) +0). Note that the successor layer and the addition layer consist of 
trainable parameters, for which the values are set through gradient descent. 
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There are two main components to the neural network we use: an embed- 
ding tree neural network that outputs a numerical vector representing the tree- 
structured proof state and a second processor network that takes this vector 
representation of the state and outputs a distribution of the actions possible in 
the environment.’ 

Tree neural networks have been used in various settings, such as natural lan- 
guage processing [20] and also in Robinson arithmetic expression embedding [13]. 
These networks consist of smaller neural networks, each representing one of the 
possible functions that occur in the expressions. For example, there will be sep- 
arate networks representing addition and multiplication. The cursor is a special 
unary operation node with its own network that we insert into the tree at the 
current location. For each unique constant, such as the constant 0 in RA or the 
identity element 1 for the AIM task, we generate a random vector (from a stan- 
dard normal distribution) that will represent this leaf. In the case of the AIM 
task, these vectors are parameters that can be optimized during training. 

At prediction time, the numerical representation of a tree is constructed by 
starting at the leaves of the tree, for which we can look up the generated vectors. 
These vectors act as input to the neural networks that represent the parent node’s 
operation, yielding a new vector, which now represents the subtree of the parent 
node. The process repeats until there is a single vector for the entire tree after 
the root node is processed (see also Fig. 2). 

The neural networks representing each operation consist of a linear transfor- 
mation, a non-linearity in the form of a rectified linear unit (ReLU) and another 
linear transformation. In the case of binary operations, the first linear transfor- 
mation will have an input dimension of 2n and an output dimension of n, where 
n is the dimension of the vectors representing leaves of the tree (the internal rep- 
resentation size). The weights representing these transformations are randomly 
initialized at the beginning of training. 

When we have obtained a single vector embedding representing the entire tree 
data structure, this vector serves as the input to the predictor neural network, 
which consists of three linear layers, with non-linearities (Sigmoid/ReLU) in 
between these layers. The last layer has an output dimension equal to the number 
of possible actions in the environment. We obtain a probability distribution over 
the actions, e.g. by applying the softmax function to the output of this last layer. 
In the cases where we also need a value prediction, there is a parallel last layer 
that predicts the state’s value (usually referred to as a two-headed network [41]). 
The internal representation size n for the Robinson arithmetic experiments is set 
to 16, for the AIM task this is 32. The number of neurons in each layer (except 
for the last one) of the predictor networks is 64. 

In the AIM dataset task, an arbitrary number of variables can be introduced 
during the proof. These are represented by untrainable random vectors. We add a 
special neural network (with the same architecture as the networks representing 
unary operations, so from size n to n) that processes these vectors before they are 


3 In the reinforcement learning baselines that we use, this second processor network 
has the additional task of predicting the value of a state. 
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processed by the rest of the tree neural network embedding. The idea is that this 
neural network learns to project these new variable vectors into a subspace and 
that an arbitrary number of variables can be handled. The vectors are resampled 
at the start of each episode, so the agent cannot learn to recognize specific 
variables. This approach was partly inspired by the prime mechanism in [13], but 
we use separate vectors for all variables instead of building vectors sequentially. 
All our neural networks are implemented using the PyTorch library [31]. 


7 Experiments 


We first describe our experiments on the Robinson arithmetic task, with which 
we designed the properties of our 3SIL approach with the help of comparisons 
with other algorithms. We then train a predictor using 3SIL on the AIMLEAP 
loop theory dataset, which we evaluate both as a standalone prover within the 
RL environment and as a neural guidance mechanism for the ATP Prover9. 


7.1 Robinson Arithmetic Dataset 


Dataset Details. The Robinson arithmetic dataset [11] is split into three dis- 
tinct sets, based on the number of steps that it takes a fixed rewriting strategy 
to normalize the expression. This fixed strategy, LOPL, which stands for left 
outermost proof length, always rewrites the leftmost possible element. If it takes 
this strategy less than 90 steps to solve the problem, it is in the low difficulty 
category. Problems with a difficulty between 90 and 130 are in the medium cat- 
egory and a greater difficulty than 130 leads to the high category. The high 
dataset also contains problems the LOPL strategy could not solve within the 
time limit. The low dataset is split into a training and testing set. We train on 
the low difficulty problems, but after training we also test on problems with a 
higher difficulty. Because we have a difficulty measure for this dataset, we use a 
curriculum setup. We start by learning to normalize the expressions that a fixed 
strategy can normalize in a small amount of steps. This setup is similar to [11]. 


Training Setup. The 400 problems with the lowest difficulty are the starting 
point. Every time an agent reaches 95 percent success rate when evaluated on a 
sample of size 400 from these problems, we add 400 more difficult problems to 
set of training problems P. One iteration of the collection and training phase 
is called an epoch. Agents are evaluated after every epoch. The blocks of size 
400 are called levels. The number of episodes m and f are set to 1000. For 3SIL 
and BC, the batch size BS is 32 and the number of batches NB is 250. The 
baselines are configured so that the number of episodes and training transitions 
is at least as many as the 3SIL/BC approaches. Episodes that take over 100 
steps are stopped. ADAM [22] is used as an optimizer. 
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Fig. 3. The level in the curriculum reached by each method. Each method was run three 
times. The bold line shows the mean performance and the shaded region shows the 
minimum and maximum performance. K is the number of proofs stored per problem. 


Results on RA Curriculum. In Fig. 3, we show the progression through the 
training curriculum for behavioral cloning (BC), the RL methods (PPO, ACER) 
and two configurations of 3SIL. Behavioral cloning simply imitates actions from 
successful episodes. Of the RL baselines, PPO reaches the second level in one run, 
while ACER steadily solves the first level and in the best run solves around 80% 
of the second level. Both methods do not learn enough solutions for the second 
level to advance to the third. A2C and SIL-PAAC do not reach the second level, 
so these are left out of the plot. However, they do learn to solve about 70-80% of 
the first 400 problems. From these results we can conclude that the RL baselines 
do not perform well on this task in our experiment. We attribute this to the 
difficulty of learning a good value function due to the sparse rewards (Sect. 5.1). 
Our hypothesis is that because this value estimate influences the policy updates, 
the RL methods do not learn well on this task. Note that the two methods with 
a trust region update mechanism, ACER and PPO, perform better than the 
methods without this mechanism. From these results, it is clear that 3SIL with 
1 shortest proof stored, k = 1, is the best-performing configuration. It reaches 
the end of the training curriculum of about 5000 problems in 40 epochs. We 
experimented with k = 3 and k = 4, but these were both worse than k = 2. 


Generalization. While our approach works well on the training set, we must 
check if the predictors generalize to unseen examples. Only the methods that 
reached the end of the curriculum are tested. In Table 1, we show the results 
of evaluating the performance of our predictors on the three different test sets: 
the unseen examples from the low dataset and the unseen examples from the 
medium and high datasets. Because we expect longer solutions, the episode limits 
are expanded from 100 steps to 200 and 250 for the medium and high datasets 
respectively. For the low and medium datasets, the second of which contains 
problems with more difficult solutions than the training data, the predictors 
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solve almost all test problems. For the high difficulty dataset, the performance 
drops by at least 20% points. Our method outperforms the Monte Carlo Tree 
Search approach used in [11] on the same datasets, which got to 0.954 on the low 
dataset with 1600 iterations and 0.786 on the medium dataset (no results on the 
high dataset were reported). These results indicate that this training method 
might be strong enough to perform well on the AIM rewriting RL task. 


Table 1. Generalization with greedy evaluation on the test set for the Robinson arith- 
metic normalization tasks, shown as average success rate and standard deviation from 
3 training runs. Generalization is high on the low and medium difficulty (training data 
is similar to the low difficulty dataset). With high difficulty data, performance drops. 


Low MEDIUM HIGH 
3SIL (k=1) 1.00 + 0.01 0.98 + 0.03 0.77 + 0.10 
3SIL (k=2) 0.99 + 0.00 0.96 + 0.01 0.66 + 0.08 
BC 0.98 + 0.01 0.98 + 0.01 0.56 + 0.05 


7.2 AIM Conjecture Dataset 


Training Setup. Finally, we train and evaluate 3SIL on the AIM Conjecture 
dataset. We apply 3SIL (k = 1) to train predictors in the AIMLEAP environ- 
ment. Ten percent of the AIM dataset is used as a hold-out test set, not seen 
during training. As there is no estimate for the difficulty of the problems in terms 
of the actions available to the predictor, we do not use a curriculum ordering 
for these experiments. The number m of episodes collected before training is 
set to 2,000,000. These random proof attempts result in about 300 proofs. The 
predictor learns from these proofs and afterwards the search for new proofs is 
also guided by its predictions. For the AIM experiments, episodes are stopped 
after 30 steps in the AIMLEAP environment. The predictors are trained for 100 
epochs. The number of collected episodes per epoch f is 10,000. The successful 
proofs are stored, and the shortest proof for each theorem is kept. NB is 500 and 
BS is set to 32. The number of problems with a solution in the history after each 
epoch of the training run is shown in Fig. 4. 


Results as a Standalone Prover. After 100 epochs, about 2500 of 3114 prob- 
lems in the training dataset have a solution in their history. To test the general- 
ization capability of the predictors, we inspect their performance on the holdout 
test set problems. In Table2 we compare the success rate of the trained pre- 
dictors on the holdout test set with three different automated theorem provers: 
E [37,38], Waldmeister [19] and Prover9. E is currently one of the best overall 
automated theorem provers [44], Waldmeister is a prover specialized in memory- 
efficient equational theorem proving [18] and Prover9 is the theorem prover that 
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Fig. 4. The number of training problems for which a solution was encountered and 
stored (cumulative). At the start of the training, the models rapidly collect more solu- 
tions, but after 100 epochs, the process slows down and settles at about 2500 problems 
with known solutions. The minimum, maximum and mean of three runs are shown. 


is used for AIM conjecture research and the prover that the dataset was gener- 
ated by. Waldmeister and E are the best performing solvers in competitions for 
the relevant unit equality (UEQ) category [44]. 


Table 2. Theorem proving performance on the hold-out test set in fraction of problems 
solved. Means and standard deviations are the results of evaluations of 3 different 
predictors from 3 different training runs on the 354 unseen test set problems. 


METHOD Success RATE 
PROVERY (60s) 0.833 

E (60s) 0.802 
PREDICTOR + AIMLEAP(60s) 0.702 + 0.015 
WALDMEISTER (608) 0.655 
PREDICTOR + AIMLEAP (1x) 0.586 + 0.029 


The results show that a single greedy evaluation of the predictor trying to 
solve the problem in the AIMLEAP environment is not as strong as the theo- 
rem proving software. However, the theorem provers got 60s of execution time, 
and the execution of the predictor, including interaction with AIMLEAP, takes 
on average less than 1s. We allowed the predictor setup to use 60s, by run- 
ning attempts in AIMLEAP until the time was up, sampling actions from the 
predictor’s distribution with 5% noise, instead of using greedy execution. With 
this approach, the predictor setup outperforms Waldmeister.4 Figure 5 shows the 
overlap between the problems solved by each prover. The diagram shows that 
each theorem prover found a few solutions that no other prover could find within 


* After the initial experiments, we also evaluated Twee [42], which won the most recent 
UEQ track: it can prove most of the test problems in 60 s, only failing for 1 problem. 
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the time limit. Almost half of all problems from the test set that are solved are 
solved by all four systems. 


5) Waldmeister 

Mmm E 

Prover9 
Predictor + AIMLEAP 


Fig. 5. Venn diagram of the test set problems solved by each solver with 60 s time 
limit. 


Results of Neural Rewriting Combined with Prover9. We also combine 
the predictor with Prover9. In this setup, the predictor modifies the starting 
form of the goal, for a maximum of 1s in the AIMLEAP environment. This 
produces new expressions on one or both sides of the equality. We then add, as 
lemmas, equalities between the left-hand side of the goal before the predictor’s 
rewriting and after each rewriting (see Fig. 1). The same is done for the right- 
hand side. For each problem, this procedure yields new lemmas that are added 
to the problem specification file that is given to Prover9. 


Table 3. Prover9 theorem proving performance on the hold-out test set when injecting 
lemmas suggested by the learned predictor. Prover9’s performance increases when 
using the suggested lemmas. 


METHOD Success RATE 
PROVERY (18) 0.715 
PROVERY (28) 0.746 
PROVERY (60s) 0.833 


REWRITING (18) + PROVERY (18) 0.841 + 0.019 
REWRITING (1s) + PROVERY (59s) 0.902 + 0.016 
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In Table 3, it is shown that adding lemmas suggested by the rewriting actions 
of the trained predictor improves the performance of Prover9. Running Prover9 
for 2s results in better performance than running it for 1s, as expected. The 
combined (1s + 1s) system improved on Prover9’s 2-s performance by 12.7% (= 
0.841/0.746), indicating that the predictor suggests useful lemmas. Additionally, 
1s of neural rewriting combined with 59s of Prover9 search proves almost 8.3% 
(= 0.902/0.833) more theorems than Prover9 with a 60s time limit (Table 2). 


7.3 Implementation Details 


All experiments for the Robinson task were run on a 16 core Intel(R) Xeon(R) 
CPU E5-2670 0 @ 2.60 GHz. The AIM experiments were run on a 72 core Intel(R) 
Xeon(R) Gold 6140 CPU @ 2.30 GHz. All calculations were done on CPU. The 
PPO implementation was adapted from an existing implementation [3]. The 
model was updated every 2000 timesteps, the PPO clip coefficient was set to 
0.2. The learning rate was 0.002 and the discount factor y was set to 0.99. 
The ACER implementation was adapted from an available implementation [8]. 
The replay buffer size was 20,000. The truncation parameter was 10 and the 
model was updated every 100 steps. The replay ratio was set to 4. Trust region 
decay was set to 0.99 and the constraint was set to 1. The discount factor was 
set to 0.99 and the learning rate to 0.001. Off-policy minibatch size was set 
to 1. The A2C and SIL implementations were based on Pytorch actor-critic 
example code available at the PyTorch repository [33]. For the A2C algorithm, 
we experimented with two formulations of the advantage function: the 1-step 
lookahead estimate (rz + YV(5i41)) — Vu(se) and the Ri — V,,(s;) formulation. 
However, we did not observe different performance, so we opted in the end for 
the 1-step estimate favored in the original A2C publication. For SIL-PAAC, we 
implemented the SIL loss on top of the A2C implementation. There is also a 
prioritized replay buffer with an exponent of 0.6, as in the original paper. Each 
epoch, 8000 (250 batches of size 32) transitions were taken from the prioritized 
replay buffer in the SIL step of the algorithm. The size of the prioritized replay 
buffer was 40,000. The critic loss weight was set to 0.01 as in the original paper. 
For the 3SIL and behavioral cloning implementations, we sample 8000 transitions 
(250 batches of size 32) from the replay buffer or history. For the behavioral 
cloning, we used a buffer of size 40,000. An example implementation of 3SIL 
can be found in the RewriteRL repository. On the Robinson arithmetic task, for 
3SIL and BC, the evaluation is done greedily (always take the highest probability 
actions). For the other methods, we performed experiments with both greedy and 
non-greedy (sample from the predictor distribution and add 5% noise) evaluation 
and show the results the best-performing setting (which in most cases was the 
non-greedy evaluation, except for PPO). On the AIM task, we evaluate greedily 
with 3SIL. 

AIMLEAP expects a distance estimate for each applicable action. This rep- 
resents the estimated distance to a proof. This behavior was converted to a 
reinforcement learning setup by always setting the chosen action of the model 
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to the minimum distance and all other actions to a distance larger than the 
maximum proof length. Only the chosen action is then carried out. 

Versions of the automated theorem provers used: Version 2.5 of E [39], the 
Nov 2017 version of Prover9 [26] and the Feb 2018 version of Waldmeister [46] 
and version 2.4.1 of Twee [43]. 


8 Conclusion and Future Work 


Our experiments show that a neural rewriter, trained with the 3SIL method 
that we designed, can learn to suggest useful lemmas that assist an ATP and 
improve its proving performance. With the same limit of 1 min, Prover9 managed 
to prove close to 8.3% more theorems. Furthermore, our 3SIL training method 
is powerful enough to train an equational prover from zero knowledge that can 
compete with hand-engineered provers, such as Waldmeister. Our system on its 
own proves 70.2% of the unseen test problems in 60s, while Waldmeister proved 
65.5%. 

In future work, we will apply our method to other equational reasoning tasks. 
An especially interesting research direction concerns selecting which proofs to 
learn from: some sub-proofs might be more general than other sub-proofs. The 
incorporation of graph neural networks instead of tree neural networks may 
improve the performance of the predictor, since in graph neural networks infor- 
mation not only propagates from the leaves to the root, but also through all 
other connections. 
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Abstract. I introduce renaming-enriched sets (rensets for short), which 
are algebraic structures axiomatizing fundamental properties of renam- 
ing (also known as variable-for-variable substitution) on syntax with 
bindings. Rensets compare favorably in some respects with the well- 
known foundation based on nominal sets. In particular, renaming is a 
more fundamental operator than the nominal swapping operator and 
enjoys a simpler, equationally expressed relationship with the variable- 
freshness predicate. Together with some natural axioms matching proper- 
ties of the syntactic constructors, rensets yield a truly minimalistic char- 
acterization of \-calculus terms as an abstract datatype — one involving 
an infinite set of unconditional equations, referring only to the most fun- 
damental term operators: the constructors and renaming. This character- 
ization yields a recursion principle, which (similarly to the case of nomi- 
nal sets) can be improved by incorporating Barendregt’s variable conven- 
tion. When interpreting syntax in semantic domains, my renaming-based 
recursor is easier to deploy than the nominal recursor. My results have 
been validated with the proof assistant Isabelle/HOL. 


1 Introduction 


Formal reasoning about syntax with bindings is necessary for the meta-theory 
of logics, calculi and programming languages, and is notoriously error-prone. 
A great deal of research has been put into formal frameworks that make the 
specification of, and the reasoning about bindings more manageable. 

Researchers wishing to formalize work involving syntax with bindings must 
choose a paradigm for representing and manipulating syntax—typically a vari- 
ant of one of the “big three”: nameful (sometimes called “nominal” reflect- 
ing its best known incarnation, nominal logic [23,39]), nameless (De Bruijn) 
[4,13,49,51] and higher-order abstract syntax (HOAS) [19,20,28,34,35]. Each 
paradigm has distinct advantages and drawbacks compared with each of the 
others, some discussed at length, e.g., in [1,9] and [25, §8.5]. And there are also 
hybrid approaches, which combine some of the advantages [14, 18,42, 47]. 

A significant advantage of the nameful paradigm is that it stays close to 
the way one informally defines and manipulates syntax when describing systems 
in textbooks and research papers—where the binding variables are explicitly 
© The Author(s) 2022 
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indicated. This can in principle ensure transparency of the formalization and 
allows the formalizer to focus on the high-level ideas. However, it only works 
if the technical challenge faced by the nameful paradigm is properly addressed: 
enabling the seamless definition and manipulation of concepts “up to alpha- 
equivalence”, i.e., in such a way that the names of the bound variables are 
(present but nevertheless) inconsequential. This is particularly stringent in the 
case of recursion due to the binding constructors of terms not being free, hence 
not being a priori traversable recursively—in that simply writing some recursive 
clauses that traverse the constructors is not a priori guaranteed to produce a 
correct definition, but needs certain favorable conditions. The problem has been 
addressed by researchers in the form of tailored nameful recursors [23,33,39, 43, 
56,57], which are theorems that identify such favorable conditions and, based 
on them, guarantee the existence of functions that recurse over the non-free 
constructors. 

In this paper, I make a contribution to the nameful paradigm in general, 
and to nameful recursion in particular. I introduce rensets, which are algebraic 
structures axiomatizing the properties of renaming, also known as variable-for- 
variable substitution, on terms with bindings (Sect. 3). Rensets differ from nom- 
inal sets (Sect. 2.2), which form the foundation of nominal logic, by their focus 
on (not necessarily injective) renaming rather than swapping (or permutation). 
Similarly to nominal sets, rensets are pervasive: Not only do the variables and 
terms form rensets, but so do any container-type combinations of rensets. 

While lacking the pleasant symmetry of swapping, my axiomatization of 
renaming has its advantages. First, renaming is more fundamental than swap- 
ping because, at an abstract axiomatic level, renaming can define swapping but 
not vice versa (Sect.4). The second advantage is about the ability to define 
another central operator: the variable freshness predicate. While the definability 
of freshness from swapping is a signature trait of nominal logic, my renaming- 
based alternative fares even better: In rensets freshness has a simple, first- 
order definition (Sect.3). This contrasts the nominal logic definition, which 
involves a second-order statement about (co)finiteness of a set of variables. The 
third advantage is largely a consequence of the second: Rensets enriched with 
constructor-like operators facilitate an equational characterization of terms with 
bindings (using an infinite set of unconditional equations), which does not seem 
possible for swapping (Sect.5.1). This produces a recursion principle (Sect. 5.2) 
which, like the nominal recursor, caters for Barendregt’s variable convention, 
and in some cases is easier to apply than the nominal recursor—for example 
when interpreting syntax in semantic domains (Sect. 5.3). 

In summary, I argue that my renaming-based axiomatization offers some 
benefits that strengthen the arsenal of the nameful paradigm: a simpler repre- 
sentation of freshness, a minimalistic equational characterization of terms, and 
a convenient recursion principle. My results are established with high confidence 
thanks to having been mechanized in Isabelle/HOL [32]. The mechanization is 
available [44] from Isabelle’s Archive of Formal Proofs. 

Here is the structure of the rest of this paper: Sect.2 provides background 
on terms with bindings and on nominal logic. Section3 introduces rensets and 
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describes their basic properties. Section4 establishes a formal connection to 
nominal sets. Section 5 discusses substitutive-set-based recursion. Section 6 dis- 
cusses related work. A technical report [45] associated to this paper includes an 
appendix with more examples and results and more background on nominal sets. 


2 Background 


This section recalls the terms of A-calculus and their basic operators (Sect. 2.1), 
and aspects of nominal logic including nominal sets and nominal recursion 
(Sect. 2.2). 


2.1 Terms with Bindings 


I work with the paradigmatic syntax of (untyped) -calculus. However, my 
results generalize routinely to syntaxes specified by arbitrary binding signatures 
such as the ones in [22, §2], [39,59] or [12]. 

Let Var be a countably infinite set of variables, ranged over by x,y,z etc. The 
set Trm of A-terms (or terms for short), ranged over by t, ti, tg etc., is defined 
by the grammar t ::= Vrz | Ap tı t2 | Lm at 
with the proviso that terms are equated (identified) modulo alpha-equivalence 
(also known as naming equivalence). Thus, for example, if « 4 z 4 y then 
Lm x (Ap (Vr x) (Vr z)) and Lm y (Ap (Vr y) (Vr z)) are considered to be the 
same term. I will often omit Vr when writing terms, as in, e.g., Lm x z. 

What the above specification means is (something equivalent to) the follow- 
ing: One first defines the set PTrm of pre-terms as freely generated by the gram- 
mar p := PVra | PAp pı p2 | PLm z p. Then one defines the alpha-equival- 
ence relation = : PTrm — PTrm — Bool inductively, proves that it is an equiv- 
alence, and defines Trm by quotienting PTrm to alpha-equivalence, i.e., Trm = 
PTrm/ =. Finally, one proves that the pre-term constructors are compatible 
with =, and defines the term counterpart of these constructors: Vr : Var — Trm, 
Ap: Trm > Trm — Trm and Lm: Var — Trm — Trm. 

The above constructions are technical, but well-understood, and can be fully 
automated for an arbitrary syntax with bindings (not just that of A-calculus); 
and tools such as the Isabelle/Nominal package [59,60] provide this automation, 
hiding pre-terms completely from the end user. In formal and informal presenta- 
tions alike, one usually prefers to forget about pre-terms, and work with terms 
only. This has several advantages, including (1) being able to formalize concepts 
at the right abstraction level (since in most applications the naming of bound 
variables should be inconsequential) and (2) the renaming operator being well- 
behaved. However, there are some difficulties that need to be overcome when 
working with terms, and in this paper I focus on one of the major ones: provid- 
ing recursion principles, i.e., mechanisms for defining functions by recursing over 
terms. This difficulty arises essentially because, unlike in the case of pre-term 
constructors, the binding constructor for terms is not free. 

The main characters of my paper will be (generalizations of) some common 
operations and relations on Trm, namely: 


Rensets and Renaming-Based Recursion for Syntax with Bindings 621 


— the constructors Vr : Var — Trm, Ap: Trm — Trm — Trm and Lm : Var > 
Trm — Trm 

— (capture-avoiding) renaming, also known as (capture-avoiding) substitution 
of variables for variables _[_/] : Trm Var Var Trm; e.g., we have 
(Lm z (Ap z y)) [x/y] = Lm 2’ (Ap a! z) 

— swapping _[-A_]: Trm — Var — Var — Trm; e.g., we have (Lm a (Ap x y)) [£^ 
y] = Lm y (Ap y z) 

— the free-variable operator FV : Trm — Pow(Var) (where Pow(Var) is the 
powerset of Var); e.g., we have FV(Lm x (Ap y x)) = {y} 

— freshness -#- : Var > Trm — Bool; e.g., we have z # (Lm z x); and assuming 
x Æ y, we have 7 x # (Lm y z) 


The free-variable and freshness operators are of course related: A variable x 
is fresh for a term t (i.e., x #t) if and only if it is not free in t (ie., x ¢ FV(t)). 
The renaming operator _[_/] : Trm — Var — Var — Trm substitutes (in terms) 
variables for variables, not terms for variables. (But an algebraization of term- 
for-variable substitution is discussed in [45, Appendix D].) 


2.2 Background on Nominal Logic 


I will employ a formulation of nominal logic [38,39,57] that does not require any 
special logical foundation, e.g., axiomatic nominal set theory. For simplicity, I 
prefer the swapping-based formulation [38] to the equivalent permutation-based 
formulation—|[45, Appendix C] gives details on these two alternatives. 

A pre-nominal set is a pair A = (A,_[-A_]) where A is a set and [A]: 
A — Perm — A is a function called the swapping operator of A satisfying the 
following properties for all a € A and x, £1, £2, y1, Y2 € Var: 


Identity: aļx ^z] = a 
Involution: ala, Aaq][x1 A z2] = a 
Compositionality: aļxı \x2][y1 ^ ya] = afyı A y2] [(x1 [y1 A y2]) A (z2[y1 A ye) 


Given a pre-nominal set A = (A, -[-^]), an element a € A and a set X C Var, 
one says that a is supported by X if a|x^y] = a holds for all x,y € Var such that 
x,y ¢ X. An element a € A is called finitely supported if there exists a finite 
set X C A such that a is supported by X. A nominal set is a pre-nominal set 
A = (A, _[A]) such that every element of a is finitely supported. If A = (A, _[A]) 
is a nominal set and a € A, then the smallest set X C A such that a is supported 
by X exists, and is denoted by supp“ a and called the support of a. One calls a 
variable x fresh for a, written z #a, if x ¢ supp^ a. 

An alternative, more direct definition of freshness (which is preferred, e.g., 
by Isabelle/Nominal [59,60]) is provided by the following proposition: 


Proposition 1. For any nominal set A = (A, -[-^-]) and any x € Var anda € A, 
it holds that x #a if and only if the set {y | alyA a] 4 a} is finite. 
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Given two pre-nominal sets A = (A,_[-A-]) and B = (B,-[-^A-]), the set 
F =(A-— B) of functions from A to B becomes a pre-nominal set F = (F, _|A]) 
by defining f|x^y] to send each a € A to (f(a[zAy]))[zAy]. F is not a nominal 
set because not all functions are finitely supported (though of course one obtains 
a nominal set by restricting to finitely supported functions). 

The set of terms together with their swapping operator, (Trm, _[-A_]), forms 
a nominal set, where the support of a term is precisely its set of free variables. 
However, the power of nominal logic resides in the fact that not only the set of 
terms, but also many other sets can be organized as nominal sets—including the 
target domains of many functions one may wish to define on terms. This gives 
rise to a convenient mechanism for defining functions recursively on terms: 


Theorem 2 [39]. Let A = (A,-[]) be a nominal set and let Vr4 : Var > 
A, Ap* : A = A => A and Lm’: Var = A => A be some functions, all 
supported by a finite set X of variables and with Lm^ satisfying the following 
freshness condition for binders (FCB): There exists x € Var such that x ¢ X 
and «# Lm“ z a for all a € A. 

Then there exists a unique function f : Trm — A that is supported by X 
and such that the following hold for all x € Var and t1, t2,t € Trm: 


(i) f(Vra) =Vr4 z (ii) f (Ap ti t2) = Ap^ (F t1) (F t2) 
(iii) f (Lm at) =Lm^ z (f t) if £ ¢ X 


A useful feature of nominal recursion is the support for Barendregt’s famous 
variable convention |8, p. 26]: “If [the terms] t1,...,tn occur in a certain math- 
ematical context (e.g. definition, proof), then in these terms all bound variables 
are chosen to be different from the free variables.” The above recursion princi- 
ple adheres to this convention by fixing a finite set X of variables meant to be 
free in the definition context and guaranteeing that the bound variables in the 
definitional clauses are distinct from them. Formally, the target domain opera- 
tors Vr^, Ap^ and Lm^ are supported by X, and the clause for -abstraction 
is conditioned by the binding variable x being outside of X. (The Barendregt 
convention is also present in nominal logic via induction principles [39, 58—60].) 


3 Rensets 


This section introduces rensets, an alternative to nominal sets that axiomatize 
renaming rather than swapping or permutation. 

A renaming-enriched set (renset for short) is a pair A = (A, _[-/]) where A 
is a set and _[_/] : A — Var > Var — A is an operator such that the following 
hold for all 2,21, £2, £3, Y, Y1, Y2 E Var and a € A: 


Identity: aļæz/z] =a 
Idempotence: If xı Æ y then aļzı/y]lx2/y] = aļzı/y] 
Chaining: If y 4 xə then aļy/z2][£2/z1][£3/£2] = aly /xə][x3/x1] 
Commutativity: If z2 Æ yı Æ £1 Æ ye then alxe/x1|[y2/y1] = aly2/y1][£2/x1] 
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Let us call A the carrier of A and [_/] the renaming operator of A. Similarly 
to the case of terms, we think of the elements a € A as some kind of variable- 
bearing entities and of a[y/z] as the result of substituting x with y in a. With 
this intuition, the above properties are natural: Identity says that substituting a 
variable with itself has no effect. Idempotence acknowledges the fact that, after 
its renaming, a variable y is no longer there, so substituting it again has no 
effect. Chaining says that a chain of renamings x3/a2/x, has the same effect 
as the end-to-end renaming 23/2, provided there is no interference from 22, 
which is ensured by initially substituting x2 with some other variable y. Finally, 
Commutativity allows the reordering of any two independent renamings. 


Examples. (Var, [-/]) and (Trm, [-/]), the sets of variables and terms with 
the standard renaming operator on them, form rensets. Moreover, given any 
functor F on the category of sets and a renset A = (A, _[_/]), let us define the 
renset F A = (F A, _[-/]) as follows: for any k € F A and x,y E Var, k[x/y] = 
F (_x/y]) k, where the last occurrence of F refers to the action of the functor 
on morphisms. This means that one can freely build new rensets from existing 
ones using container types (which are particular kinds of functors)—e.g., lists, 
sets, trees etc. Another way to put it: Rensets are closed under datatype and 
codatatype constructions [55]. 

In what follows, let us fix a renset A = (A, -_[_/]). One can define the notion 
of freshness of a variable for an element of a in the style of nominal logic. But 
the next proposition shows that simpler formulations are available. 


Proposition 3. The following are equivalent: 
(1) The set {y € Var | a[y/z] # a} is finite. 
(2) aly/z] = a for all y € Var. (3) aly/z] = a for some y € Var x {x}. 


Let us define the predicate _#_: Var > A — Bool as follows: «#a, read x 
is fresh for a, if either of Proposition 3’s equivalent properties holds. 

Thus, points (1)-(3) above are three alternative formulations of x #a, all 
referring to the lack of effect of substituting y for x, expressed as a[y/x] = a: 
namely that this phenomenon affects (1) all but a finite number of variables y, 
(2) all variables y, or (3) some variable y 4 x. The first formulation is the most 
complex of the three—it is the nominal definition, but using renaming instead 
of swapping. The other two formulations do not have counterparts in nominal 
logic, essentially because swapping is not as “efficient” as renaming at exposing 
freshness. In particular, (3) does not have a nominal counterpart because there is 
no single-swapping litmus test for freshness. The closest we can get to property 
(3) in a nominal set is the following: z is fresh for a if and only a[y^ z] = a holds 
for some fresh y—but this needs freshness to explain freshness! 


Examples (continued). For the rensets of variables and terms, freshness 
defined as above coincides with the expected operators: distinctness in the case 
of variables and standard freshness in the case of terms. And applying the defini- 
tion of freshness to rensets obtained using finitary container types has similarly 
intuitive outcomes; for example, the freshness of a variable x for a list of items 
[a1,...,@n] means that x is fresh for each item a; in the list. 
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Freshness satisfies some intuitive properties, which can be easily proved from 
its definition and the renset axioms. In particular, point (2) of the next propo- 
sition is the freshness-based version of the Chaining axiom. 


Proposition 4. The following hold: 
(1) Ifa#a then aly/z4] =a (2) eo #a then a[re/x1|[x3/r2] = aļx3/x1] 
(3) Ifz#a or z = x, and «#a or z Æ y, then z#aly/z] 


4 Connection to Nominal Sets 


So far I focused on consequences of the purely equational theory of rensets, with- 
out making any assumption about cardinality. But after additionally postulating 
a nominal-style finite support property, one can show that rensets give rise to 
nominal sets—which is what I will do in this section. 

Let us say that a renset A = (A,_[_/]) has the Finite Support property if, 
for all a € A, the set {x € Var | ~ x # a} is finite. 

Let A = (A, _[-/]) be a renset satisfying Finite Support. Let us define the 
swapping operator _[-A_] : A Var Var A as follows: aļxı A x2] = 
aly/x1][%1/%2][r2/y], where y is a variable that is fresh for all the involved items, 
namely y ¢ {11,72} and y#a. Indeed, this is how one would define swapping 
from renaming on terms: using a fresh auxiliary variable y, and exploiting that 
such a fresh y exists and that its choice is immaterial for the end result. The 
next lemma shows that this style of definition also works abstractly, i.e., all it 
needs are the renset axioms plus Finite Support. 


Lemma 5. The following hold for all x1, £2 € Var and a € A: 


(1) There exists y € Var such that y ¢ {x1, £2} and y #a. 
(2) For all y,y’ E€ Var such that y ¢ {r1, £2}, y#a, y’ € {x1, £2} and y'#a, 
aly/x1][e1/r2][v2/y] = aly’ /z1][z1/x2][x2/y%]. 


And one indeed obtains an operator satisfying the nominal axioms: 


Proposition 6. If (A,-[-/]) is a renset satisfying Finite Support, then 
(A, [-A-_]) is a nominal set. Moreover, (A, _[-/]) and (A, _[-A_]) have the same 
notion of freshness, in that the freshness operator defined from renaming coin- 
cides with that defined from swapping. 


The above construction is functorial, as I detail next. Given two nominal 
sets A = (A,_[_A_]) and B = (B,_[-A_]), a nominal morphism f : A —> B 
is a function f : A — B with the property that it commutes with swapping, 
in that (f a)[eAy] = f(alaAy]) for all a € A and x,y € Var. Nominal sets 
and nominal morphisms form a category that I will denote by Nom. Similarly, 
let us define a morphism f : A — B between two rensets A = (A, _[-/]) and 
B = (B, _[_]) to be a function f : A — B that commutes with renaming, yielding 
the category Sbs of rensets. Let us write FSbs for the full subcategory of Sbs 
given by rensets that satisfy Finite Support. Let us define F : FSbs — Nom to be 
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an operator on objects and morphisms that sends each finite-support renset to 
the above described nominal set constructed from it, and sends each substitutive 
morphism to itself. 


Theorem 7. F is a functor between FSbs and Nom which is injective on objects 
and full and faithful (i.e., bijective on morphisms). 


One may ask whether it is also possible to make the trip back: from nominal 
to rensets. The answer is negative, at least if one wants to retain the same 
notion of freshness, i.e., have the freshness predicate defined in the nominal set 
be identical to the one defined in the resulting renset. This is because swapping 
preserves the cardinality of the support, whereas renaming must be allowed to 
change it since it might perform a non-injective renaming. The following example 
captures this idea: 


Counterexample. Let A = (A, -[A_]) be a nominal set such that all elements of 
A have their support consisting of exactly two variables, x and y (with x Æ y). 
(For example, A can be the set of all terms with these free variables—this is 
indeed a nominal subset of the term nominal set because it is closed under 
swapping.) Assume for a contradiction that _[_/] is an operation on A that makes 
(A, -[-/-]) a renset with its induced freshness operator equal to that of A. Then, 
by the definition of A, aly/x] needs to have exactly two non-fresh variables. But 
this is impossible, since by Proposition 4(3), all the variables different from y 
(including x) must be fresh for a[y/z]. In particular, A is not in the image of 
the functor F : FSbs — Nom, which is therefore not surjective on objects. 
Thus, at an abstract algebraic level renaming can define swapping, but not 
the other way around. This is not too surprising, since swapping is fundamen- 
tally bijective whereas renaming is not; but it further validates our axioms for 
renaming, highlighting their ability to define a well-behaved swapping. 


5 Recursion Based on Rensets 


Proposition 3 shows that, in rensets, renaming can define freshness using only 
equality and universal or existential quantification over variables—without need- 
ing any cardinality condition like in the case of swapping. As I am about to dis- 
cuss, this forms the basis of a characterization of terms as the initial algebra of 
an equational theory (Sect. 5.1) and an expressive recursion principle (Sect. 5.2) 
that fares better than the nominal one for interpretations in semantic domains 
(Sect. 5.3). 


5.1 Equational Characterization of the Term Datatype 


Rensets contain elements that are “term-like” in as much as there is a renam- 
ing operator on them satisfying familiar properties of renaming on terms. This 
similarity with terms can be strengthened by enriching rensets with operators 
having arities that match those of the term constructors. 

A constructor-enriched renset (CE renset for short) is a tuple A = 
(A, _[-/], Vr^, Ap4, Lm) where: 
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— (A, [-/]) is a renset 
- Vr^ : Var > A, Ap^ : A —> A— A and Lm‘: Var > A —> A are functions 


such that the following hold for all a,a,,a2 € A and 2, y, z € Var: 


(S1) (Vr^ x)[y/z] = VrA(2[y/2)) 

(S2) (Ap a1 a@2)[y/z] = Api (a1 (aily/z]) (a2ly/z]) 

(S3) if x $ {y, z} then (Lin x a)[y/z] = Lm” a (aly/z}) 
(S4) a x a)[y/a] = Lm^ ra f 

(S5) if z A y then Lm^ z (a[z/y]) = Lm^ y (afz/ylly/x]) 


Let us call Vr4, Ap, Lm^ the constructors of A. (S1)-(S3) express the construc- 
tors’ commutation with renaming (with capture-avoidance provisions in the case 
of (S3)), (S4) the lack of effect of substituting for a bound variable, and (S5) 
the possibility to rename a bound variable without changing the abstracted item 
(where the inner renaming of z 4 y for y ensures the freshness of the “new name” 
y, hence its lack of interference with the other names in the “term-like” entity 
where the renaming takes place). All these are well-known to hold for terms: 


Example. Terms with renaming and the constructors, namely (Trm, _[_/], Vr, 
Ap, Lm), form a CE renset which will be denoted by Trm. 

As it turns out, the CE renset axioms capture exactly the term structure 
Trm, via initiality. The notion of CE substitutive morphism f : A — B between 
two CE rensets A = (A, _[_/], Vr^, Ap4, Lm‘) and B = (B, [_/], Vr” , Ap®, Lm”) 
is the expected one: a function f : A — B that is a substitutive morphism and 
also commutes with the constructors. Let us write Sbsce for the category of CE 
rensets and morphisms. 


Theorem 8. Trm is the initial CE renset, i.e., initial object in Sbsc¢k. 


Proof Idea. Let A = (A, -[-/], Vr^, Ap^, Lm^) be a CE renset. Instead of directly 
going after a function f : Trm — A, one first inductively defines a relation 
R : Trm — A — Bool, with inductive clauses reflecting the desired properties 
concerning the commutation with the constructors, e.g., z Tm E zga It 
suffices to prove that R is total and functional and preserves renaming, since 
that allows one to define a constructor- and renaming-preserving function (a 
morphism) f by taking f t to be the unique a with Rta. 

Proving that R is total is easy by standard induction on terms. Proving the 
other two properties, namely functionality and preservation of renaming, is more 
elaborate and requires their simultaneous proof together with a third property: 
that R preserves freshness. The simultaneous three-property proof follows by a 
form of “substitutive induction” on terms: Given a predicate @ : Trm — Bool, 
to show Vt € Trm. ¢ t it suffices to show the following: (1) Va € Var. ọ (Vr x), 
(2) Vty,t2 € Trm. 6 ti & ġ tg — (Ap tı t2), and (3) Yx € Var, t € Trm. (Vs € 
Trm. Con; /; ts > @ s) + ¢(Lm z t), where Con, ; t s means that t is 
connected to s by a chain of renamings. 

Roughly speaking, R turns out to be functional because the A-abstraction 
operator on the “term-like” inhabitants of A is, thanks to the axioms of CE 
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renset, at least as non-injective as (i.e., identifies at least as many items as) the 
A-abstraction operator on terms. 


Theorem 8 is the central result of this paper, from both practical and theo- 
retical perspectives. Practically, it enables a useful form of recursion on terms (as 
I will discuss in the following sections). Theoretically, this is a characterization 
of terms as the initial algebra of an equational theory that only the most funda- 
mental term operations, namely the constructors and renaming. The equational 
theory consists of the axioms of CE rensets (i.e., those of rensets plus (S1)—(S5)), 
which are an infinite set of unconditional equations—for example, axiom (S5) 
gives one equation for each pair of distinct variables y, z. 

It is instructive to compare this characterization with the one offered by 
nominal logic, namely by Theorem 2. To do this, one first needs a lemma: 


Lemma 9. Let f : A — B be a function between two nominal sets A = (A, [^ 
|) and B = (B, -[-^-]) and X a set of variables. Then f is supported by X if 
and only if f(a[zA y]) = (f a)[~ ^y] for all x,y € Vars X. 


Now Theorem 2 (with the variable avoidance set X taken to be Ø) can be 
rephrased as an initiality statement, as I describe below. 

Let us define a constructor-enriched nominal set (CE nominal set) to be 
any tuple A = (A, [LA^], Vr, Ap4, Lm“) where (A, _[-A_]) is a nominal set and 
Vr^ : Var > A, Ap* : A => A > A, Lm^ : Var > A —> A are operators on A 
such that the following properties hold for all a,a1,a2 E€ A and x,y,z € Var: 


(N1) vei x)[yA z] = Vri (æly^ z]) 

(N2) (Ap™ 1 az)[y Az] = Ap” (arly A^ z]) (aaly Az] 

(N3) (Lm z a)[yA 2] = Lm^ (z[y^2]) (aly Az) 

(N4) c#Lm z a, i.e., {y € Var | (Lm z a)y ^ x] 4 Lm z a} is finite. 


The notion of CE nominal morphism is defined as the expected extension 
of that of nominal morphism: a function that commutes with swapping and the 
constructors. Let Nomcg be the category of CE nominal sets morphisms. 


Theorem 10 ((39], rephrased). (Trm,-[- A _],Vr,Ap,Lm) is the initial CE 
nominal set, i.e., the initial object in Nomce. 


The above theorem indeed corresponds exactly to Theorem 2 with X = Q: 


— the conditions (N1)—(N3) in the definition of CE nominal sets correspond (via 
Lemma 9) to the constructors being supported by 9 

— (N4) is the freshness condition for binders 

— initiality, i.e., the existence of a unique morphism, is the same as the existence 
of the unique function f : Trm — A stipulated in Theorem 2: commutation 
with the constructors is the Theorem 2 conditions (i)—(iii), and commutation 
with swapping means (via Lemma 9) f being supported by @. 


Unlike the renaming-based characterization of terms (Theorem 8), the nom- 
inal logic characterization (Theorem 10) is not purely equational. This is due 
to a combination of two factors: (1) two of the axioms ((N4) and the Finite 
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Support condition) referring to freshness and (2) the impossibility of expressing 
freshness equationally from swapping. The problem seems fundamental, in that a 
nominal-style characterization does not seem to be expressible purely equation- 
ally. By contrast, while the freshness idea is implicit in the CE renset axioms, 
the freshness predicate itself is absent from Theorem 8. 


5.2 Barendregt-Enhanced Recursion Principle 


While Theorem 8 already gives a recursion principle, it is possible to improve it 
by incorporating Barendregt’s variable convention (in the style of Theorem 2): 


Theorem 11. Let X be a finite set, (A, _[_/]) a renset and Vr : Var > A, 
Ap^ : A > A > A and Lm“: Var —> A —> A some functions that satisfy the 
clauses (S1)—(S5) from the definition of CE renset, but only under the assumption 
that x,y,z ¢ X. Then there exists a unique function f : Trm — A such that th 
following hold: 


(i) f (Vra) =Vr4 x (ii) f (Ap ty t2) = Ap (F t1) (F t2) 
(ii) f (Lm 2) =Lm x (ft) ieg X Gv) 7 (tlu/2) = (F Diu if yz ¢ X 


Proof Idea. The constructions in the proof of Theorem 8 can be adapted 
to avoid clashing with the finite set of variables X. For example, the 
clause for A-abstraction in the inductive definition of the relation R becomes 
F ies D) E = J and preservation of renaming and freshness are also formu- 
lated to avoid X. Totality is still ensured thanks to the possibility of renaming 
bound variables—in terms and inhabitants of A alike (via the modified axiom 


(S5)). 


The above theorem says that if the structure A is assumed to be “almost” a 
CE set, save for additional restrictions involving the avoidance of X, then there 
exists a unique “almost”-morphism—satisfying the CE substitutive morphism 
conditions restricted so that the bound and renaming-participating variables 
avoid X. It is the renaming-based counterpart of the nominal Theorem 2. 

In regards to the relative expressiveness of these two recursion principles 
(Theorems 11 and 2), it seems difficult to find an example that is definable 
by one but not by the other. In particular, my principle can seamlessly define 
standard nominal examples [39,40] such as the length of a term, the count- 
ing of -abstractions or of the free-variables occurrences, and term-for-variable 
substitution—[45, Appendix A] gives details. However, as I am about to discuss, 
I found an important class of examples where my renaming-based principle is 
significantly easier to deploy: that of interpreting syntax in semantic domains. 


5.3 Extended Example: Semantic Interpretation 


Semantic interpretations, also known as denotations (or denotational seman- 
tics), are pervasive in the meta-theory of logics and à-calculi, for example when 
interpretating first-order logic (FOL) formulas in FOL models, or untyped or 
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simply-typed A-calculus or higher-order logic terms in specific models (such as 
full-frame or Henkin models). In what follows, I will focus on A-terms and Henkin 
models, but the ideas discussed apply broadly to any kind of statically scoped 
interpretation of terms or formulas involving binders. 

Let D be a set and ap: D > D —> D and Im: (D > D) —> D be operators 
modeling semantic notions of application and abstraction. An environment will 
be a function £ : Var > D. Given x,y € Var and d,e € D, let us write ¿lx := d) 
for € updated with value d for x (i.e., acting like € on all variables except for x 
where it returns d); and let us write ¿(x := d, y := e) instead of ¿(x := d)(y := e). 

Say one wants to interpret terms in the semantic domain D in the context of 
environments, i.e., define the function sem : Trm — (Var > D) — D that maps 
syntactic to semantic constructs; e.g., one would like to have: 


— sem (Lm z (Ap a x)) € = Im(d + ap d d) (regardless of £) 
— sem (Lm z (Ap x y)) € = Im(d + ap d (£ y)) (assuming x Æ y) 


where I use d+ ... to describe functions in D — D, e.g., d+ ap d d is the 
function sending every d € D to ap dd. 

The definition should therefore naturally go recursively by the clauses: 

(1) sem (Vr r) = Ex (2) sem (Ap tı t2) E = ap (sem tı £) (sem te £) 

(3) sem (Lm z t) € = Im (d+ sem t (E(x := d))) 

Of course, since Trm is not a free datatype, these clauses do not work out of 
the box, i.e., do not form a definition (yet )—this is where binding-aware recursion 
principles such as Theorems 11 and 2 could step in. I will next try them both. 

The three clauses above already determine constructor operations Vr’, Ap? 
and Lm? on the set of interpretations, I = (Var > D) > D, namely: 


— Vr: Var > I by Ve rif=Eor 
- Apt: I = I — I by Ap” i, iz € = ap (i1 €) (i2 £) 
- Lm? : Var > I > I by Lm” z i € = Im (d > i (E(x := d))) 


To apply the renaming-based recursion principle from Theorem 11, one must 
further define a renaming operator on J. Since the only chance to successfully 
apply this principle is if sem commutes with renaming, the definition should be 
inspired by the question: How can sem(t[y/x]) be determined from sem t, y and 
x? The answer is (4) sem (t[y/x]) € = (sem t) (€(a := £ y)), yielding an operator 
/? : I > Var — Var > I defined by i [y/z]* € = i (Elx := £ y)). 

It is not difficult to verify that Z = (J, [_/]7, Vr”, Ap’, Lm?) is a CE renset— 
for example, Isabelle’s automatic methods discharge all the goals. This means 
Theorem 11 (or, since here one doesn’t need Barendregt’s variable convention, 
already Theorem 8) is applicable, and gives us a unique function sem that com- 
mutes with the constructors, i.e., satisfies clauses (1)-(3) (which are instances of 
the clauses (i)-(iii) from Theorem 11), and additionally commutes with renam- 
ing, i.e., satisfies clause (4) (which is an instances of the clause (iv) from Theo- 
rem 11). 

On the other hand, to apply nominal recursion for defining sem, one must 
identify a swapping operator on J. Similarly to the case of renaming, this identifi- 
cation process is guided by the goal of determining sem(t|x^y]) from sem t, x and 
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y, leading to (4’) sem (t{a/A y]) € = sem t (E(x := £ y, y := £ x)), which yields the 
definition of [-\_]7 by i[xAy]* E= i (Elx := £ y, y := £ x)). However, as pointed 
out by Pitts [39, §6.3] (in the slightly different context of interpreting simply- 
typed A-calculus), the nominal recursor (Theorem 2) does not directly apply 
(hence neither does my reformulation based on CE nominal sets, Theorem 10). 
This is because, in my terminology, the structure Z = (I, [A F, Vr”, Ap’, Lm?) 
is not a CE nominal set. The problematic condition is FCB (the freshness condi- 
tion for binders), requiring that « #7 (Lm? z i) holds for all i € I. Expanding the 
definition of #7 (the nominal definition of freshness from swapping, recalled in 
Sect. 2.2) and the definitions of [-\_]? and Lm’, one can see that « #7 (Lm? z i) 


means the following: 
m (d > iE = Eyy = £a) = d))) = Im (d o ile = A), ie 
Im (d= i (E(x := d, y := £ x)) = ne w i (E(x := By. holds for all but a finite 
number of variables y. 

The only chance for the above to be true is if i, when applied to an envi- 
ronment, ignores the value of y in that environment for all but a finite number 
of variables y; in other words, i only analyzes the value of a finite number of 
variables in that environment—but this is not guaranteed to hold for arbitrary 
elements i € I. To repair this, Pitts engages in a form of induction-recursion [17], 
carving out from J a smaller domain that is still large enough to interpret all 
terms, then proving that both FCB and the other axioms hold for this restricted 
domain. It all works out in the end, but the technicalities are quite involved. 

Although FCB is not required by the renaming-based principle, note inci- 
dentally that this condition would actually be true (and immediate to check) if 
working with freshness defined not from swapping but from renaming. Indeed, 
the renaming-based version of x #7 (Lm? a i) says that Im (d œ i(€(@ := 
Ey)(x% := d))) = Im (d & i(&(a := d))) holds for all y (or at least for some 
y # x)—which is immediate since ¿lx := € y)(a := d) = E(x := d). This further 
illustrates the idea that semantic domains ‘favor’ renaming over swapping. 

In conclusion, for interpreting syntax in semantic domains, my renaming- 
based recursor is trivial to apply, whereas the nominal recursor requires some 
fairly involved additional definitions and proofs. 


6 Conclusion and Related Work 


This paper introduced and studied rensets, contributing (1) theoretically, a min- 
imalistic equational characterization of the datatype of terms with bindings and 
(2) practically, an addition to the formal arsenal for manipulating syntax with 
bindings. It is part of a longstanding line of work by myself and collabora- 
tors on exploring convenient definition and reasoning principles for bindings 
[25,27,43,46,47], and will be incorporated into the ongoing implementation of a 
new Isabelle definitional package for binding-aware datatypes [12]. 


Initial Model Characterizations of the Terms Datatype. My results pro- 
vide a truly elementary characterization of terms with bindings, as an “ordinary” 
datatype specified by the fundamental operations only (the constructors plus 
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b Popes heri& 
et al. [22] | Pitts Urban Norrish | , -P°™ Ghee This 
Hofmann [39] et al. [33] & Gunter | Popescu papei 
[57,56] [46] [25] 
[29] 
Paradigm | nameless | nameful | nameful | nameful | nameful | nameful | nameful 
Barendregt? n/a yes yes yes no no yes 
derlying 
Underlying Set* Set Set Set Set Set Set 
category 
Required ctors, ctors, aie ctors, 
fi ctors, ctors, term/var ctors, 
operations/ | rename, swap, swap, 
; perm perm subst, rename 
relations free-vars free-vars fresh 
fresh 
. Horn 
Required ae clauses, Hom Horn Horn Horn : 
. ality, clauses, equations 
properties .,__ | fresh-def, clauses | clauses | clauses 
naturality fresh-def 
fin-supp 


Fig. 1. Initial model characterizations of the datatype of terms with bindings “ctors” = 
“constructors”, “perm” = “permutation”, “fresh” = “the freshness predicate”, “fresh- 
def” = “clause for defining the freshness predicate”, “fin-supp” = “Finite Support” 


variable-for-variable renaming) and some equations (those defining CE rensets). 
As far as specification simplicity goes, this is “the next best thing” after a com- 
pletely free datatype such as those of natural numbers or lists. 

Figure 1 shows previous characterizations from the literature, in which terms 
with bindings are identified as an initial model (or algebra) of some kind. For 
each of these, I indicate (1) the employed reasoning paradigm, (2) whether the 
initiality/recursion theorem features an extension with Barendregt’s variable 
convention, (3) the underlying category (from where the carriers of the models 
are taken), (4) the operations and relations on terms to which the models must 
provide counterparts and (5) the properties required on the models. 

While some of these results enjoy elegant mathematical properties of intrinsic 
value, my main interest is in the recursors they enable, specifically in the ease of 
deploying these recursors. That is, I am interested in how easy it is in principle 
to organize the target domain as a model of the requested type, hence obtain 
the desired morphism, i.e., get the recursive definition done. By this measure, 
elementary approaches relying on standard FOL-like models whose carriers are 
sets rather than pre-sheaves have an advantage. Also, it seems intuitive that a 
recursor is easier to apply if there are fewer operators, and fewer and structurally 
simpler properties required on its models—although empirical evidence of suc- 
cessfully deploying the recursor in practice should complement the simplicity 
assessment, to ensure that simplicity is not sponsored by lack of expressiveness. 

The first column in Fig. 1’s table contains an influential representative of the 
nameless paradigm: the result obtained independently by Fiore et al. [22] and 
Hofmann [29] characterizing terms as initial in the category of algebras over the 
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pre-sheaf topos Set", where F is the category of finite ordinals and functions 
between them. The operators required by algebras are the constructors, as well 
as the free-variable operator (implicitly as part of the separation on levels) and 
the injective renamings (as part of the functorial structure). The algebra’s carrier 
is required to be a functor and the constructors to be natural transformations. 
There are several variations of this approach, e.g., [5,11,29], some implemented 
in proof assistants, e.g., [3,4,31]. 

The other columns refer to initiality results that are more closely related to 
mine. They take place within the nameful paradigm, and they all rely on ele- 
mentary models (with set carriers). Pitts’s already discussed nominal recursor 
[39] (based on previous work by Gabbay and Pitts [23]) employs the constructors 
and permutation (or swapping), and requires that its models satisfy some Horn 
clauses for constructors, permutation and freshness, together with the second- 
order properties that (1) define freshness from swapping and (2) express Finite 
Support. Urban et al.’s version [56,57] implemented in Isabelle/Nominal is an 
improvement of Pitts’s in that it removes the Finite Support requirement from 
the models—which is practically significant because it enables non-finitely sup- 
ported target domains for recursion. Norrish’s result [33] is explicitly inspired by 
nominal logic, but renounces the definability of the free-variable operator from 
swapping—with the price of taking both swapping and free-variables as primi- 
tives. My previous work with Gunter and Gheri takes as primitives either term- 
for-variable substitution and freshness [46] or swapping and freshness [25], and 
requires properties expressed by different Horn clauses (and does not explore a 
Barendregt dimension, like Pitts, Urban et al. and Norrish do). My previous focus 
on term-for-variable substitution [46] (as opposed to renaming, i.e., variable-for- 
variable substitution) impairs expressiveness—for example, the depth of a term 
is not definable using a recursor based on term-for-variable substitution because 
we cannot say how term-for-variable substitution affects the depth of a term 
based on its depth and that of the substitutee alone. My current result based 
on rensets keeps freshness out of the primitive operators base (like nominal logic 
does), and provides an unconditionally equational characterization using only 
constructors and renaming. The key to achieving this minimality is the simple 
expression of freshness from renaming in my axiomatization of rensets. In future 
work, I plan a systematic formal comparison of the relative expressiveness of all 
these nameful recursors. 


Recursors in Other Paradigms. Figure 1 focuses on nameful recursors, while 
only the Fiore et al./Hofmann recursor for the sake of a rough comparison with 
the nameless approach. I should stress that such a comparison is necessarily 
rough, since the nameless recursors do not give the same “payload” as the name- 
ful ones. This is because of the handling of bound variables. In the nameless 
paradigm, the A-constructor does not explicitly take a variable as an input, as 
in Lm a t, i.e., does not have type Var — Trm — Trm. Instead, the bindings 
are indicated through nameless pointers to positions in a term. So the nameless 
A-constructor, let’s call it NLm, takes only a term, as in NLm t, i.e., has type 
Trm — Trm or a scope-safe (polymorphic or dependently-typed) variation of this, 
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e.g., [nher TrM > Trmn+ [22,29] or Tlaetype Trma > Trma+unit [5,11]. The à- 
constructor is of course matched by operators in the considered models, which 
appears in the clauses of the functions f defined recursively on terms: Instead 
of a clause of the form f (Lm x t) = (expression depending on x and ft) from 
the nameful paradigm, in the nameless paradigm one gets a clause of the form 
f (NLm t) = (expression depending on ft). A nameless recursor is usually eas- 
ier to prove correct and easier to apply because the nameless constructor NLm 
is free—whereas a nameful recursor must wrestle with the non-freeness of Lm, 
handled by verifying certain properties of the target models. However, once the 
definition is done, having nameful clauses pays off by allowing “textbook-style” 
proofs that stay close to the informal presentation of a calculus or logic, whereas 
with the nameless definition some additional index shifting bureaucracy is nec- 
essary. (See [9] for a detailed discussion, and [14] for a hybrid solution.) 

A comparison of nameful recursion with HOAS recursion is also generally 
difficult, since major HOAS frameworks such as Abella [7], Beluga [37] or 
Twelf [36] are developed within non-standard logical foundations, allowing a 
A-constructor of type (Trm — Trm) — Trm, which is not amenable to typi- 
cal well-foundedness based recursion but requires some custom solutions (e.g., 
[21,50]). However, the weak HOAS variant [16,27] employs a constructor of the 
form WHLm : (Var — Trm) — Trm which is recursable, and in fact yields a 
free datatype, let us call it WHTrm—one generated by WHVr : Var — WHTrm, 
WHAp : WHTrm — WHTrm — WHTrm and WHLm. WHTrm contains (natural 
encodings of) all terms but also additional entities referred to as “exotic terms”. 
Partly because of the exotic terms, this free datatype by itself is not very helpful 
for recursively defining useful functions on terms. But the situation is dramati- 
cally improved if one employs a variant of weak HOAS called parametric HOAS 
(PHOAS) [15], i.e., takes Var not as a fixed type but as a type parameter (type 
variable) and works with Tvsiestype Trmyar; this enables many useful definitions 
by choosing a suitable type Var (usually large enough to make the necessary dis- 
tinctions) and then performing standard recursion. The functions definable in 
the style of PHOAS seem to be exactly those definable via the semantic domain 
interpretation pattern (Sect. 5.3): Choosing the instantiation of Var to a type 
T corresponds to employing environments in Var — T. (I illustrate this at the 
end of [45, Appendix A] by showing the semantic-domain version of a PHOAS 
example.) 

As a hybrid nameful/HOAS approach we can count Gordon and Melham’s 
characterization of the datatype of terms [26], which employs the nameful con- 
structors but formulates recursion treating Lm as if recursing in the weak-HOAS 
datatype WHTrm. Norrish’s recursor [33] (a participant in Fig.1) has been 
inferred from Gordon and Melham’s one. Weak-HOAS recursion also has inter- 
esting connections with nameless recursion: In presheaf toposes such as those 
employed by Fiore et al. [22], Hofmann [29] and Ambler et al. [6], for any object 
T the function space Var = T is isomorphic to the De Bruijn level shifting trans- 
formation applied to T; this effectively equates the weak-HOAS and nameless 
recursors. A final cross-paradigm note: In themselves, nominal sets are not con- 
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fined to the nameful paradigm; their category is equivalent [23] to the Schanuel 
topos [30], which is attractive for pursuing the nameless approach. 


Axiomatizations of Renaming. In his study of name-passing process calculi, 
Staton [52] considers an enrichment of nominal sets with renaming (in addition 
to swapping) and axiomatizes renaming with the help of the nominal (swapping- 
defined) freshness predicate. He shows that the resulted category is equivalent to 
the non-injective renaming counterpart of the Schanuel topos (i.e., the subcat- 
egory of Set™ consisting of functors that preserve pullbacks of monos). Gabbay 
and Hofmann [24] provide an elementary characterization of the above category, 
in terms of nominal renaming sets, which are sets equipped with a multiple- 
variable-renaming action satisfying identity and composition laws, and a form 
of Finite Support (FS). Nominal renaming sets seem very related to rensets 
satisfying FS. Indeed, any nominal renaming set forms a FS-satisfying renset 
when restricted to single-variable renaming. Conversely, I conjecture that any 
FS-satisfying renset gives rise to a nominal renaming set. This correspondence 
seems similar to the one between the permutation-based and swapping-based 
alternative axiomatizations of nominal sets—in that the two express the same 
concept up to an isomorphism of categories. In their paper, Gabbay and Hof- 
mann do not study renaming-based recursion, beyond noting the availability of a 
recursor stemming from the functor-category view (which, as I discussed above, 
enables nameless recursion with a weak-HOAS flavor). Pitts [41] introduces nom- 
inal sets with 01-substitution structure, which axiomatize substitution of one of 
two possible constants for variables on top of the nominal axiomatization, and 
proves that they form a category that is equivalent with that of cubical sets [10], 
hence relevant for the univalent foundations [54]. 


Other Work. Sun [53] develops universal algebra for first-order languages with 
bindings (generalizing work by Aczel [2]) and proves a completeness theorem. In 
joint work with Roşu [48], I develop first-order logic and prove completeness on 
top of a generic syntax with axiomatized free-variables and substitution. 


Renaming Versus Swapping and Nominal Logic, Final Round. I believe 
that my work complements rather than competes with nominal logic. My results 
do not challenge the swapping-based approach to defining syntax (defining the 
alpha-equivalence on pre-terms and quotienting to obtain terms) recommended 
by nominal logic, which is more elegant than a renaming-based alternative; but 
my easier-to-apply recursor can be a useful addition even on top of the nominal 
substratum. Moreover, some of my constructions are explicitly inspired by the 
nominal ones. For example, I started by adapting the nominal idea of defining 
freshness from swapping before noticing that renaming enables a simpler formu- 
lation. My formal treatment of Barendregt’s variable convention also originates 
from nominal logic—as it turns out, this idea works equally well in my setting. 
In fact, I came to believe that the possibility of a Barendregt enhancement is 
largely orthogonal to the particularities of a binding-aware recursor. In future 
work, I plan to investigate this, i.e., seek general conditions under which an 
initiality principle (such as Theorems 10 and 8) is amenable to a Barendregt 
enhancement (such as Theorems 2 and 11, respectively). 
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Abstract. The characterizing properties of a proof-theoretical presen- 
tation of a given logic may hang on the choice of proof formalism, on 
the shape of the logical rules and of the sequents manipulated by a given 
proof system, on the underlying notion of consequence, and even on the 
expressiveness of its linguistic resources and on the logical framework into 
which it is embedded. Standard (one-dimensional) logics determined by 
(non-deterministic) logical matrices are known to be axiomatizable by 
analytic and possibly finite proof systems as soon as they turn out to 
satisfy a certain constraint of sufficient expressiveness. In this paper we 
introduce a recipe for cooking up a two-dimensional logical matrix (or 
B-matrix) by the combination of two (possibly partial) non-deterministic 
logical matrices. We will show that such a combination may result in B- 
matrices satisfying the property of sufficient expressiveness, even when 
the input matrices are not sufficiently expressive in isolation, and we will 
use this result to show that one-dimensional logics that are not finitely 
axiomatizable may inhabit finitely axiomatizable two-dimensional logics, 
becoming, thus, finitely axiomatizable by the addition of an extra dimen- 
sion. We will illustrate the said construction using a well-known logic of 
formal inconsistency called mCi. We will first prove that this logic is not 
finitely axiomatizable by a one-dimensional (generalized) Hilbert-style 
system. Then, taking advantage of a known 5-valued non-deterministic 
logical matrix for this logic, we will combine it with another one, conve- 
niently chosen so as to give rise to a B-matrix that is axiomatized by a 
two-dimensional Hilbert-style system that is both finite and analytic. 


Keywords: Hilbert-style proof systems - finite axiomatizability - 
consequence relations - non-deterministic semantics - paraconsistency 


1 Introduction 


A logic is commonly defined nowadays as a relation that connects collections 
of formulas from a formal language and satisfies some closure properties. The 
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established connections are called consecutions and each of them has two parts, 
an antecedent and a succedent, the latter often being said to ‘follow from’ (or 
to be a consequence of) the former. A logic may be manufactured in a number 
of ways, in particular as being induced by the set of derivations justified by 
the rules of inference of a given proof system. There are different kinds of proof 
systems, the differences between them residing mainly in the shapes of their rules 
of inference and on the way derivations are built. We will be interested here in 
Hilbert-style proof systems (‘H-systems’, for short), whose rules of inference have 
the same shape of the consecutions of the logic they canonically induce and whose 
associated derivations consist in expanding a given antecedent by applications of 
rules of inference until the desired succedent is produced. A remarkable property 
of an H-system is that the logic induced by it is the least logic containing the 
rules of inference of the system; in the words of [24], the system constitutes a 
‘logical basis’ for the said logic. 

Conventional H-systems, which we here dub ‘SET-FMLA H-systems’, do not 
allow for more than one formula in the succedents of the consecutions that they 
manipulate. Since [23], however, we have learned that the simple elimination of 
this restriction on H-systems —that is, allowing for sets of formulas rather than 
single formulas in the succedents— brings numerous advantages, among which 
we mention: modularity (correspondence between rules of inference and proper- 
ties satisfied by a semantical structure), analyticity (control over the resources 
demanded to produce a derivation), and the automatic generation of analytic 
proof systems for a wide class of logics specified by sufficiently expressive non- 
deterministics semantics, with an associated straightforward proof-search pro- 
cedure [13,18]. Such generalized systems, here dubbed ‘SET-SET H-systems’, 
induce logics whose consecutions involve succedents consisting in a collection of 
formulas, intuitively understood as ‘alternative conclusions’. 

An H-system # is said to be an aziomatization for a given logic £ when the 
logic induced by # coincides with £L. A desirable property for an axiomatization 
is finiteness, namely the property of consisting on a finite collection of schematic 
axioms and rules of inference. A logic having a finite axiomatization is said to 
be ‘finitely based’. In the literature, one may find examples of logics having a 
quite simple, finite semantic presentation, being, in contrast, not finitely based 
in terms of SET-FMLA H-systems [21]. These very logics, however, when seen 
as companions of logics with multiple formulas in the succedent, turn out to be 
finitely based in terms of SET-SET H-systems [18]. In other words, by updating 
the underlying proof-theoretical and the logical formalisms, we are able to obtain 
a finite axiomatization for logics which in a more restricted setting could not be 
said to be finitely based. We may compare the above mentioned movement to 
the common mathematical practice of adding dimensions in order to provide 
better insight on some phenomenon. A well-known example of that is given by 
the Fundamental Theorem of Algebra, which provides an elegant solution to the 
problem of determining the roots of polynomials over a single variable, demand- 
ing only that real coefficients should be replaced by complex coefficients. Another 
example, from Machine Learning, is the ‘kernel trick’ employed in support vector 
machines: by increasing the dimensionality of the input space, the transformed 
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data points become more easily separable by hyperplanes, making it possible to 
achieve better results in classification tasks. 

It is worth noting that there are logics that fail to be finitely based in terms 
of SET-SET H-systems. An example of a logic designed with the sole purpose of 
illustrating this possibility was provided in [18]. One of the goals of the present 
work is to show that an important logic from the literature of logics of formal 
inconsistency (LFIs) called mCi is also an example of this phenomenon. This 
logic results from adding infinitely-many axiom schemas to the logic mbC, a logic 
that is obtained by extending positive classical logic with two axiom schemas. 
Incidentally, along the proof of this result, we will show that mCi is the limit of a 
strictly increasing chain of LFIs extending mbC (comparable to the case of Clim 
in da Costa’s hierarchy of increasingly weaker paraconsistent calculi [16]). A nat- 
ural question, then, is whether we can enrich our technology, in the same vein, in 
order to provide finite axiomatizations for all these logics. We answer that in the 
affirmative by means of the two-dimensional frameworks developed in [11,17]. 
Logics, in this case, connect pairs of collections of formulas. A consecution, in 
this setting, may be read as involving formulas that are accepted and those that 
are not, as well as formulas that are rejected and those that are not. ‘Accep- 
tance’ and ‘rejection’ are seen, thus, as two orthogonal dimensions that may 
interact, making it possible, thus, to express more complex consecutions than 
those expressible in one-dimensional logics. Two-dimensional H-systems, which 
we call ‘SeT?-SET? H-systems’, generalize SET-SET H-systems so as to manipu- 
late pairs of collections of formulas, canonically inducing two-dimensional logics 
and constituting logical bases for them. Another goal of the present work is, 
therefore, to show how to obtain a two-dimensional logic inhabited by a (possibly 
not finitely based) one-dimensional logic of interest. More than that, the logic we 
obtain will be finitely axiomatizable in terms of a SET?-SET” analytic H-system. 
The only requirements is that the one-dimensional logic of interest must have 
an associated semantics in terms of a finite non-deterministic logical matrix and 
that this matrix can be combined with another one through a novel procedure 
that we will introduce, resulting in a two-dimensional non-deterministic matrix 
(a B-matrix [9]) satisfying a certain condition of sufficient expressiveness [17]. 
An application of this approach will be provided here in order to produce the 
first finite and analytic axiomatization of mCi. 

The paper is organized as follows: Sect.2 introduces basic terminology and 
definitions regarding algebras and languages. Section3 presents the notions of 
one-dimensional logics and SET-SET H-systems. Section4 proves that mCi is 
not finitely axiomatizable by one-dimensional H-systems. Section 5 introduces 
two-dimensional logics and H-systems, and describes the approach to extending 
a logical matrix to a B-matrix with the goal of finding a finite two-dimensional 
axiomatization for the logic associated with the former. Section 6 presents a two- 
dimensional finite analytic H-system for mCi. In the final remarks, we highlight 
some byproducts of our present approach and some features of the resulting 
proof systems, in addition to pointing to some directions for further research.! 


' Detailed proofs of some results may be found in https://arxiv.org/abs/2205.08920. 
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2 Preliminaries 


A propositional signature is a family X := {Xk }kew, where each X is a collection 
of k-ary connectives. We say that X is finite when its base set U,<,, Xk is 
finite. A non-deterministic algebra over X, or simply X-nd-algebra, is a structure 
A := (A,-a), such that A is a non-empty collection of values called the carrier 
of A, and, for each k € w and © € Xp, the multifunction ©, : A* —> P(A) 
is the interpretation of © in A. When X and A are finite, we say that A is 
finite. When the range of all interpretations of A contains only singletons, A 
is said to be a deterministic algebra over X, or simply a X-algebra, meeting 
the usual definition from Universal Algebra [12]. When @ is not in the range 
of each ©,, A is said to be total. Given a X-algebra A and a © € X1, we 
let © (x) := x and Oil (x) := © (© (x)). A mapping v : A > Bisa 
homomorphism from A to B when, for all k € w, © € Xk and z1,..., £k E A, we 
have f[©,(%1,---,2%)] E ©p(f(21),---, f(£k)). The set of all homomorphisms 
from A to B is denoted by Homy(A,B). When B = A, we write Ends(A), 
rather than Homy(A, A), for the set of endomorphisms on A. 

Let P be a denumerable collection of propositional variables and X be a 
propositional signature. The absolutely free X-algebra freely generated by P is 
denoted by L»(P) and called the X’-language generated by P. The elements of 
Ly(P) are called X-formulas, and those among them that are not propositional 
variables are called X-compounds. Given © C Ly(P), we denote by ®© the set 
Ly(P)\®. The homomorphisms from Ly(P) to A are called valuations on A, 
and we denote by Vals(A) the collection thereof. Additionally, endomorphisms 
on Ls,(P) are dubbed 37-substitutions, and we let Subst, := Ends;(Ls;(P)); when 
there is no risk of confusion, we may omit the superscript from this notation. 

Given y € Ly(P), let props(y) be the set of propositional variables occurring 
in y. If props(y) = {p1,..., pe}, we say that ọ is k-ary (unary, for k = 1; binary, 
for k = 2) and let ya : A* — P(A) be the k-ary multifunction on A induced 
by p, where, for all £1,...,£k E A, we have ya(a1,...,2%%) := {v(y) | v € 
Vals (A) and v(p;) = zi, for 1 < i < k}. Moreover, given Y1,..., Yk E€ Ly»(P), 
we write p(y1,..., Yk) for the X-formula yy, p)(~1,.-.,%x), and, where  C 
Ly(P) is a set of k-ary Y-formulas, we let ®(u1,..., Yk) = {p(1,.--, ve) | p E 
P}. Given Y € Ly(P), by subf(y) we refer to the set of subformulas of p. Where 
0 is a unary X-formula, we define the set subf? (p) as {o(0) |o : P — subf(y)}. 
Given a set © D {p} of unary Y-formulas, we set subf? (p) := Useo subf?(y). 
For example, if © = {p, =p}, we will have subf°(=(q V r)) = {q,r,¢ V r, 7(¢ V 
r)} U {~q, =r, 7(¢ Vr), =—(q V r)}. Such generalized notion of subformulas will 
be used in the next section to provide a more generous proof-theoretical concept 
of analyticity. 


3 One-Dimensional Consequence Relations 


A SET-SET statement (or sequent) is a pair (®,W) € P(Ly(P)) x P(Ls(P)), 
where ® is dubbed the antecedent and W the succedent. A one-dimensional con- 
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sequence relation on Ly(P) is a collection > of SET-SET statements satisfying, 
for all 8, Y, 0’, W' C Ly (P), 


(O) if ENY £ Ø, then 6b Y 
(D) if > Y, then BUP > YUY 
(C) if TUS> YUI for all M C Ly(P), then $> Y 


Properties (O), (D) and (C) are called overlap, dilution and cut, respectively. 
The relation > is called substitution-invariant when it satisfies, for every a € 
Subs y, 


(S) if > Y, then o|] > oY] 
and it is called finitary when it satisfies 
(E) if > Y, then Jf > W for some finite Pf C and Yf Cw 


One-dimensional consequence relations will also be referred to as one-dimen- 
sional logics. Substitution-invariant finitary one-dimensional logics will be called 
standard. We will denote by » the complement of >, called the compatibility 
relation associated with > [10]. 

A SET-FMLA statement is a sequent having a single formula as consequent. 
When we restrict standard consequence relations to collections of SET-FMLA 
statements, we define the so-called (substitution-invariant finitary) Tarskian con- 
sequence relations. Every one-dimensional consequence relation > determines a 
Tarskian consequence relation k C P(Ls(P))x Ls(P), dubbed the SET-FMLA 


Tarskian companion of œ, such that, for all U {y} C Ls(P), & k w if, 
and only if, > {w}. It is well-known that the collection of all Tarskian con- 
sequence relations over a fixed language constitutes a complete lattice under 
set-theoretical inclusion [25]. Given a set C of such relations, we will denote by 
|_|C its supremum in the latter lattice. 

We present in what follows two ways of obtaining one-dimensional conse- 
quence relations: one semantical, via non-deterministic logical matrices [6], and 
the other proof-theoretical, via SET-SET Hilbert-style systems [18, 23]. 

A non-deterministic X-matrix, or simply X-nd-matriz, is a structure M := 
(A, D), where A is a 37-nd-algebra, whose carrier is the set of truth-values, and 
D C Ais the set of designated truth-values. Such structures are also known in 
the literature as ‘PNmatrices’ [7]; they generalize the so-called ‘Nmatrices’ [5], 
which are X-nd-matrices with the restriction that A must be total. From now on, 
whenever X C A, we denote A\X by X. In case A is deterministic, we simply 
say that M is a 3’-matriz. Also, M is said to be finite when A is finite. Every X- 
nd-matrix M determines a substitution-invariant one-dimensional consequence 
relation over X, denoted by œm, such that ® >, W if, and only if, for all v € 
Vals (A), v[] N D # Ø or vw] ND F Ø. It is worth noting that >, is finitary 
whenever the carrier of A is finite (the proof runs very similar to that of the 
same result for Nmatrices [5, Theorem 3.15}). 

A strong homomorphism between X-matrices Mı := (Ai, D1) and Me := 
(Ag, D2) is a homomorphism h between A; and Ag such that x € D; if, and 
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only if, h(x) € D2. When there is a surjective strong homomorphism between 
Mı and M2, we have that Dy, = Oms: 

Now, to the Hilbert-style systems. A (schematic) SET-SET rule of infer- 
ence R, is the collection of all substitution instances of the SET-SET statement 
s, called the schema of Rs. Each r € Rs is called a rule instance of Rs. A 
(schematic) SET-SET H-system R is a collection of SET-SET rules of inference. 
When we constrain the rule instances of R to having only singletons as succe- 
dents, we obtain the conventional notion of Hilbert-style system, called here 
SET-FMLA H-system. 

An R-derivation in a SET-SET H-system R is a rooted directed tree t such 
that every node is labelled with sets of formulas or with a discontinuation sym- 
bol x, and in which every non-leaf node (that is, a node with child nodes) n in t 
is an expansion of n by a rule instance r of R. This means that the antecedent 
of r is contained in the label of n and that n has exactly one child node for 
each formula w in the succedent of r. These child nodes are, in turn, labelled 
with the same formulas as those of n plus the respective formula w. In case r 
has an empty succedent, then n has a single child node labelled with *. Here we 
will consider only finitary SET-SET H-systems, in which each rule instance has 
finite antecedent and succedent. In such cases, we only need to consider finite 
derivations. Figure 1 illustrates how derivations using only finitary rules of infer- 
ence may be graphically represented. We denote by @*(n) the label of the node 
n in the tree t. It is worth observing that, for SET-FMLA H-systems, derivations 
are linear trees (as rule instances have a single formula in their succedents), 
or, in other words, just sequences of formulas built by applications of the rule 
instances, matching thus the conventional definition of Hilbert-style systems. 


O 
Fig. 1. Graphical representation of R-derivations, for R finitary. The dashed edges and 
blank circles represent other branches that may exist in the derivation. We usually 
omit the formulas inherited from the parent node, exhibiting only the ones introduced 


by the applied rule of inference. In both cases, we must have I’ C ® to enable the 
application of the rule. 


A node n of an R-derivation t is called A-closed in case it is a leaf node with 
é*(n) = x or £*(n)N AF Ø. A branch of t is A-closed when it ends in a A-closed 
node. When every branch in t is A-closed, we say that R is itself A-closed. An 
R-proof of a SET-SET statement (@,W) is a W-closed R-derivation t such that 
é*(rt(t)) C B. 
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Consider the binary relation >p on P(Lys(P)) such that > pW if, and only if, 
there is an R-proof of (®,W). This relation is the smallest substitution-invariant 
one-dimensional consequence relation containing the rules of inference of R, and 
it is finitary when R is finitary. Since SET-SET (and SET-FMLA) H-systems 
canonically induce one-dimensional consequence relations, we may refer to them 
as one-dimensional H-systems or one-dimensional axziomatizations. In case there 
is a proof of (®,W) whose nodes are labelled only with subsets of subf? [Ð U WJ, 
we write DDR Y . In case >p = PR, we say that R is O-analytic. Note that the 
ordinary notion of analyticity obtains when O = {p}. From now on, whenever 
we use the word “analytic” we will mean this extended notion of O-analyticity, 
for some O implicit in the context. When the O happens to be important for us 
or we identify any risk of confusion, we will mention it explicitly. 

n [13], based on the seminal results on axiomatizability via SET-SET H- 
systems by Shoesmith and Smiley [23], it was proved that any non-deterministic 
logical matrix M satisfying a criterion of sufficient expressiveness is axiomatiz- 
able by a O-analytic SET-SET Hilbert-style system, which is finite whenever M is 
finite, where O is the set of separators for the pairs of truth-values of M. Accord- 
ing to such criterion, an nd-matrix is sufficiently expressive when, for every pair 
(x,y) of distinct truth-values, there is a unary formula S, called a separator for 
(x,y), such that Sa (x) C D and Sa(y) C D, or vice-versa; in other words, when 
every pair of distinct truth-values is separable in M. 

We emphasize that it is essential for the above result the adoption of SET- 
SET H-systems, instead of the more restricted SET-FMLA H-systems. In fact, 
while two-valued matrices may always be finitely axiomatized by SET-FMLA H- 
systems [22], there are sufficiently expressive three-valued deterministic matrices 
[21] and even quite simple two-valued non-deterministic matrices [19] that fail to 
be finitely axiomatized by SET-FMLA H-systems. When the nd-matrix at hand is 
not sufficiently expressive, we may observe the same phenomenon of not having 
a finite axiomatization also in terms of SET-SET H-systems, even if the said nd- 
matrix is finite. The first example (and, to the best of our knowledge, the only 
one in the current literature) of this fact appeared in [13], which we reproduce 
here for later reference: 


Example 1. Consider the signature X := {Xk}kew such that Xı := {g,h} and 
Xp := Ø for all k # 1. Let M := (A, {a}) be a Y-nd-matrix, with A := {a,b,c} 


and 
ante) = fO ife=c bate) = S ifr=b 


A, otherwise A, otherwise 


This matrix is not sufficiently expressive because there is no separator for the 
pair (b,c), and [13] proved that it is not axiomatizable by a finite SET-SET 
H-system, even though an infinite SET-SET system that captures it has a quite 
simple description in terms of the following infinite collection of schemas: 


h'(p) 
p, g(p) 


, for alli € w. 
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In the next section, we reveal another example of this same phenomenon, this 
time of the known LFI [14] called mCi. In the path of proving that this logic 
is not axiomatizable by a finite SET-SET H-system, we will show that there are 
infinitely many LFIs between mbC and mCi, organized in a strictly increasing 
chain whose limit is mCi itself. 

Before continuing, it is worth emphasizing that any given non-sufficiently 
expressive nd-matrix may be conservatively extended to a sufficiently expressive 
nd-matrix provided new connectives are added to the language [18]. These new 
connectives have the sole purpose of separating the pairs of truth-values for which 
no separator is available in the original language. The SET-SET system produced 
from this extended nd-matrix can, then, be used to reason over the original 
logic, since the extension is conservative. However, these new connectives, which 
a priori have no meaning, are very likely to appear in derivations of consecutions 
of the original logic. This might not look like an attractive option to inferentialists 
who believe that purity of the schematic rules governing a given logical constant 
is essential for the meaning of the latter to be coherently fixed. In the subsequent 
sections, we will introduce and apply a potentially more expressive notion of logic 
in order to provide a finite and analytic H-system for logics that are not finitely 
axiomatizable in one dimension, while preserving their original languages. 


4 The Logic mCi is Not Finitely Axiomatizable 


A one-dimensional logic > over X is said to be —-paraconsistent when we have 
p,7p & gq, for p,q € P. Moreover, > is —-gently explosive in case there is a 
collection C(p) C Ls(P) of unary formulas such that, for some y € Ly(P), we 
have O(y), y > 9; Oly), =y > g,and, for all y € Ly(P), Olp), p, =p > Ø. We 
say that > is a logic of formal inconsistency (LFI) in case it is ~-paraconsistent 
yet —-gently explosive. In case O(p) = {op}, for o a (primitive or composite) 
consistency connective, the logic is said also to be a C-system. In what follows, 
let X° be the propositional signature such that X? := {7,0}, X3 := {A,V, D}, 
and X? := Ø for all k ¢ {1,2}. 

One of the simplest C-systems is the logic mbC, which was first presented in 
terms of a SET-FMLA H-system over X° obtained by extending any SET-FMLA 
H-system for positive classical logic (CPL*) with the following pair of axiom 
schemas: 


(em) pV =p 
(bel) opD (pD (~p D q)) 


The logic mCi, in turn, is the C-system resulting from extending the H- 
system for mbC with the following (infinitely many) axiom schemas [20] (the 
resulting SET-FMLA H-system is denoted here by Hmci): 


(ci) sop D (p A =p) 
(ci); o~ op (for all 0 < j < w) 
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A unary connective © is said to constitute a classical negation in a one- 
dimensional logic > extending CPL" in case, for all y, Y € Ly(P), Ø> ypVO©(y) 
and Ø> yD (©(p) Dw). One of the main differences between mCi and mbC is 
that an inconsistency connective e may be defined in the former using the para- 
consistent negation, instead of a classical negation, by setting ey := a0 [20]. 

Both logics above were presented in [15] in ways other than H-systems: via 
tableau systems, via bivaluation semantics and via possible-translations seman- 
tics. In addition, while these logics are known not to be characterizable by a 
single finite deterministic matrix [20], a characteristic nd-matrix is available for 
mbC [1] and a 5-valued non-deterministic logical matrix is available for mCi [2], 
witnessing the importance of non-deterministic semantics in the study of non- 
classical logics. Such characterizations, moreover, allow for the extraction of 
sequent-style systems for these logics by the methodologies developed in [3,4]. 
Since mCi’s 5-valued nd-matrix will be useful for us in future sections, we recall 
it below for ease of reference. 


Definition 1. Let Vs :={f, F,I,T,t} and Ys := {I,T,t}. Define the X° -matriz 
Mmci := (As,Y5) such that As := (Vs,-a,) interprets the connectives of X° 
according to the following: 


{f} if either xı Z Ys or xə € Ys 
{I,t} otherwise 


= {I,t} if either xı € Y5 or x2 € Y5 
{f}  ifzı,x2 Z Ys 


jis es if either xı € Y5 or x2 € Y5 
{f}  ifa1 © Ys and z2 € Y5 
}F LF EIT i 
mas HLH {TI LO {Py {fi oas {TE {TH LPH {T} {TF 


One might be tempted to apply the axiomatization algorithm of [13] to the 
finite non-deterministic logical matrix defined above to obtain a finite and ana- 
lytic SET-SET system for mCi. However, it is not obvious, at first, whether this 
matrix is sufficiently expressive or not (we will, in fact, prove that it is not). 
In what follows, we will show now mCi is actually axiomatizable neither by a 
finite SET-FMLA H-system (first part), nor by a finite SET-SET H-system (sec- 
ond part); it so happens, thus, that it was not by chance that Amc; has been 
originally presented with infinitely many rule schemas. For the first part, we rely 
on the following general result: 


Theorem 1 ((25], Theorem 2.2.8, adapted). Let — be a standard Tarskian 
consequence relation. Then — is aziomatizable by a finite SET-FMLA H-system 
if, and only if, there is no strictly increasing sequence k, =, bai si, ... of stan- 


dard Tarskian consequence relations such that — = Lhew & 
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In order to apply the above theorem, we first present a family of finite SET-FMLA 
H-systems that, in the sequel, will be used to provide an increasing sequence 
of standard Tarskian consequence relations whose supremum is precisely mCi. 
Next, we show that this sequence is stricly increasing, by employing the matrix 
methodology traditionally used for showing the independence of axioms in a 
proof system. 


Definition 2. For each k € w, let Hi ci be a SET-FMLA H-system for positive 
classical logic together with the schemas (em), (bc1), (ci) and (ci);, for all 0 < 
JSk. 


Since #* cœ; may be obtained from Hmci by deleting some (infinitely many) 
axioms, it is immediate that: 


Proposition 1. For every k € w, ae C r 
mCi 


The way we define the promised increasing sequence of consequence relations 


in the next result is by taking the systems Faci with odd superscripts, namely, 


| 
Ha : lass i AE 
where k is even will facilitate, in particular, the ‘proof of Lemma 3. 


Lemma 1. For each1<k<vw, let aa Then_- ch E .. and 
mCi 


kaar = Ll isr<u e 


Finally, we prove that the sequence outlined in the paragraph before Lemma 1 
is strictly increasing. In order to achieve this, we define, for each 1 < k < w, a 
5/°-matrix Mx and prove that Oe ae is sound with respect to such matrix. Then, 
in the second part of the proof (the “independence part”), we show that, for each 
1 < k < w, My, fails to validate the rule schema (ci) ;, for j = 2k, which is present 


2(k-+1)— 1 
in Hci 


we will be working with the sequence .. Excluding the cases 


. In this way, by the contrapositive of the sour result proved 
in the first part, we will have (ci); provable in Hia ARTU 


HEA. In what follows, for any k € w, we use k* to ae to the successor of k. 


Definition 3. Let1 < k < w. Define the 2k* -valued X° -matriz My := (Ax, Dr) 
such that Dy := {k* +1,...,2k*} and Ay := ({1,...,2k*},-a,), the interpreta- 
tion of X° in Ax given by the following operations: 


o J1 if x,y € Dk _ Jk*+1 fay €D, 
IVA,Y = TAA,Y i= 


1 while unprovable in 


k* +1 otherwise 1 otherwise 
5 1 if x € Dy andy ¢ Dk 
T = 
Any k*+1 otherwise 

k*+1 if x € {1,2k* 

1 if x = 2k* 7 ye tee} 

CA, EF = : mAT = 4 x+ k* if2<a<k* 

k* +1 otherwise : 

x—(k*—1) ifk* +1<a<2k*-1 
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Before continuing, we state results concerning this construction, which will be 
used in the remainder of the current line of argumentation. In what follows, when 
there is no risk of confusion, we omit the subscript ‘A,’ from the interpretations 
to simplify the notation. 


Lemma 2. For allk >1 and1<m< 2k, 


(k*+1)+ 35, ifm is even 
1+ nH, otherwise 


aen] 


Lemma 3. For all 1 < k <w, we have ET o=~?ãop but Par ok on, 
mCi 


mCi 


Finally, Theorem 1, Lemma 1 and Lemma 3 give us the main result: 
Theorem 2. mCi is not axiomatizable by a finite SET-FMLA H-system. 


For the second part —namely, that no finite SET-SET H-system axiomatizes 
mCi—, we make use of the following result: 


Theorem 3 ((23], Theorem 5.37, adapted). Let > be a one-dimensional 
consequence relation over a propositional signature containing the binary con- 
nective V. Suppose that the SET-FMLA Tarskian companion of >, denoted by 


L, satisfies the following property: 


S yV yhy if and only if, okey and ®, pkey (Disj) 


If a SET-SET H-system R aziomatizes >, then R may be converted into a SET- 
FMLA H-system for = that is finite whenever R is finite. 


It turns out that: 
Lemma 4. mCi satisfies (Disj). 


Proof. The non-deterministic semantics of mCi gives us that, for all y,w € 


Ls (P), 2 Omna: Y V Yi YV Omno: 2 VY, and o V Y Omna: P Y, and such facts 
easily imply (Disj). 


Theorem 4. mCi is not axiomatizable by a finite SET-SET H-system. 


Proof. If R were a finite SET-SET H-system for mCi, then, by Lemma 4 and 
Theorem 3, it could be turned into a finite SET-FMLA H-system for this very 
logic. This would contradict Theorem 2. 


Finding a finite one-dimensional H-system for mCi (analytic or not) over the 
same language, then, proved to be impossible. The previous result also tells us 
that there is no sufficiently expressive non-deterministic matrix that character- 
izes mCi (for otherwise the recipe in [13] would deliver a finite analytic SET-SET 
H-system for it), and we may conclude, in particular, that: 
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Corollary 1. The nd-matriz Mmci is not sufficiently expressive. 


The pairs of truth-values of Mmci that seem not to be separable (at least 
one of these pairs must not be, in view of the above corollary) are (t, T) and 
(f, F). The insufficiency of expressive power to take these specific pairs of values 
apart, however, would be circumvented if we had considered instead the matrix 
defined below, obtained from Myc; by changing its set of designated values: 


Definition 4. Let Mia; := (As, Ns), where Ns := {f,1,T}. 


Note that, in Mhhci, we have t ¢ Ns, while T € Ns, and we have that f € Ns, 
while F ¢ Ns. Therefore, the single propositional variable p separates in ME oi 
the pairs (t, T) and (f, F). On the other hand, it is not clear now whether the 
pairs (t, F) and (f,T) are separable in this new matrix. Nonetheless, we will 
see, in the next section, how we can take advantage of the semantics of non- 
deterministic B-matrices in order to combine the expressiveness of Myc; and 
Mħci in a very simple and intuitive manner, preserving the language and the 
algebra shared by these matrices. The notion of logic induced by the resulting 
structure will not be one-dimensional, as the one presented before, but rather 
two-dimensional, in a sense we shall detail in a moment. We identify two impor- 
tant aspects of this combination: first, the logics determined by the original 
matrices can be fully recovered from the combined logic; and, second, since the 
notions of H-systems and sufficient expressiveness, as well as the axiomatization 
algorithm of [13], were generalized in [17], the resulting two-dimensional logic 
may be algorithmically axiomatized by an analytic two-dimensional H-system 
that is finite if the combining matrices are finite, provided the criterion of suffi- 
cient expressiveness is satisfied after the combination. This will be the case, in 
particular, when we combine Mmci and M?,;. Consequently, this novel way of 
combining logics provides a quite general approach for producing finite and ana- 
lytic axiomatizations for logics determined by non-deterministic logical matrices 
that fail to be finitely axiomatizable in one dimension; this includes the logics 
from Example 1, and also mCi. 


5 Two-Dimensional Logics 


From now on, we will employ the symbols Y, A, N and MN to informally refer to, 
respectively, the cognitive attitudes of acceptance, non-acceptance, rejection and 
non-rejection, collected in the set Atts := {Y, A, N, N}. Given a set 8 C Ly (P), 
we will write @, to intuitively mean that a given agent entertains the cognitive 
attitude a € Atts with respect to the formulas in ®, that is: the formulas in 
Py will be understood as being accepted by the agent; the ones in ®,, as non- 
accepted; the ones in y, as rejected; and the ones in y, as non-rejected. Where 
a € Atts, we let & be its flipped version, that is, Y := A, A:= Y, Ñ := N and 
N := N. 


We refer to each ( 


? 


: “o € P(La(P))? xP(Ls(P))? as a B-statement, where 
(fy, Py) is the antecedent and (Pa, Pn) is the succedent. The sets in the latter 
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pairs are called components. A B-consequence relation is a collection -|+ of B- 
statements satisfying: 


(02) if By NG, £ Z or Pn N Sy # Ø, then $4|F4 
(D2) if aA and Wa C ®, for every a € Atts, then 2 = aE 
(C2) if $2 | 5 for all By C Qs C BF and Gy CQ» CG, then Git | SA 


A B-consequence relation is called substitution-invariant if, in addition, 4| 2 
holds whenever, for every ø € Subs: 


(S2) 7 uA and Pa = o (Wa) for every a € Atts 
Moreover, a B-consequence relation is called finitary when it enjoys the property 


(F2) if $4 wee then for some finite PÊ C Pa, and each a € Atts 


pË = 

ala 
In what follows, B-consequence relations will also be referred to as two-dimen- 
sional logics. The complement of -|:, sometimes called the compatibility relation 
associated with -|+ [10], will be denoted by -*:. Every B-consequence relation 
C := -|+ induces one-dimensional consequence relations >E and De, such that 
Py OE, iff Z ISA, and dy>Cby iff 24 ea Given a one-dimensional consequence 
relation >, we say that it inhabits “the t-aspect of C if > = >S, and that it 
inhabits the f-aspect of C if > = De. B-consequence relations actually induce 
many other (even non-Tarskian) one-dimensional notions of logics; the reader is 
referred to [9,11] for a thorough presentation on this topic. 

As we did for one-dimensional consequence relations, we present now realiza- 
tions of B-consequence relations, first via the semantics of nd-B-matrices, then 
by means of two-dimensional H-systems. 

A non-deterministic B-matriz over X, or simply »/-nd-B-matriz, is a struc- 
ture M := (A,Y,N), where A is a Y-nd-algebra, Y C A is the set of designated 
values and N C A is the set of antidesignated values of M. For convenience, we 
define A := A\Y to be the set of non-designated values, and VI := A\N to be 
the set of non-antidesignated values of M. The elements of Valy(A) are dubbed 
M-valuations. The B-entailment relation determined by IN is a collection +|: m 
of B-statements such that 


Bn Pa m iff there is no M-valuation v such that 
Py PN i v(Pa) C a for each a € Atts, 


(B-ent) 
for every Py, n, Pa, Pn C Ly(P). Whenever $4 | $A Am, we say that the B- 
statement (è T holds in WM or is valid in M. ie M-valuation that bears 


witness to Fae A m is called a countermodel for (3) in Mt. One may eas- 
ily check that zji m is a substitution-invariant B-consequence relation, that is 
finitary when A is finite. Taking C as |: mt, we define >P? := >£ and > := pE, 


Finite Two-Dimensional Proof Systems 653 


Wy || Yn 


Fig. 2. Graphical representation of finite R-derivations. We emphasize that, in both 
cases, we must have Wy C Py and Wy C Gn to enable the application of the rule. 


We move now to two-dimensional, or SET?-SET’, H-systems, first introduced 
in [17]. A (schematic) SET?-SET? rule of inference R; is the collection of all sub- 
stitution instances of the SET?-SET? statement s, called the schema of R. Each 
r € Rs is said to be a rule instance of Rs. In a proof-theoretic context, rather 


than writing the B-statement (S254) , we shall denote the corresponding rule 

Py || On 

Py || Sn 
rules of inference. SET?-SET? derivations are as in the SET-SET H-systems, but 
now the nodes are labelled with pairs of sets of formulas, instead of a single set. 
When applying a rule instance, each formula in the succedent produces a new 
branch as before, but now the formula goes to the same component in which 
it was found in the rule instance. See Fig.2 for a general representation and 
compare it with Fig. 1. 

Let t be an W-derivation. A node n of t is (W,, Yy)-closed in case it is dis- 
continued (namely, labelled with *) or it is a leaf node with t(n) = (y, Oy) 
and either dy N Ya # Ø or n N Yn # Ø. A branch of t is (W,, Yy)-closed 
when it ends in a (W,, Wy)-closed node. An %-derivation t is said to be (Y4, Wy)- 
closed when all of its branches are (W4, Wy)-closed. An R-proof of (3) is a 
(1, Bn)-closed R-derivation t with £'(rt(t)) C (fy, Sy). The definitions of the 
(finitary) substitution-invariant B-consequence relation -|+ % induced by a (fini- 


. A (schematic) SeT?-SET? H-system 9 is a collection of SET?-SET? 


tary) SET?-SET? H-system R and @-analyticity are obvious generalizations of 
the corresponding SET-SET definitions. 

In [17], the notion of sufficient expressiveness was generalized to nd-B- 
matrices. We reproduce here the main definitions for self-containment: 


Definition 5. Let M := (A,Y,N) be a Y-nd-B-matriz. 


- Given X,Y C A and a E {Y,N}, we say that X and Y are a-separated, 
denoted by X#aY, if X Ca and Y Ca, or vice-versa. 

— Given distinct truth-values x,y E€ A, a unary formula S is a separator for 
(x,y) whenever Sa(x)#aSal(y) for some a € {Y,N}. If there is a separator 
for each pair of distinct truth-values in A, then IN is said to be sufficiently 
expressive. 


In the same work [17], the axiomatization algorithm of [13] was also general- 
ized, guaranteeing that every sufficiently expressive nd-B-matrix Wt is axiomati- 
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zable by a @-analytic SET?-SET* H-system, which is finite whenever M is finite, 
where © is a set of separators for the pairs of truth-values of M. Note that, in 
the second bullet of the above definition, a unary formula is characterized as 
a separator whenever it separates a pair of truth-values according to at least 
one of the distinguished sets of values. This means that having two of such sets 
may allow us to separate more pairs of truth-values than having a single set, 
that is, the nd-B-matrices are, in this sense, potentially more expressive than 
the (one-dimensional) logical matrices. 


Example 2. Let A be the X-nd-algebra from Example 1, and consider the nd- 
B-matrix M := (A, {a}, {b}). As we know, in this matrix the pair (b,c) is not 
separable if we consider only the set of designated values {a}. However, as we 
have now the set {b} of antidesignated truth-values, the separation becomes evi- 
dent: the propositional variable p is a separator for this pair now, since b € {b} 
and c ¢ {b}. The recipe from [17] produces the following SET?-SET* axiomati- 
zation for M, with only three very simple schematic rules of inference: 


p||p | | p 
| fp) p || p || t(p) 


By construction, the one-dimensional logic determined by the nd-matrix of 
Example 1 inhabits the t-aspect of :|:m, thus it can be seen as being axiom- 
atized by this finite and analytic two-dimensional system (contrast with the 
infinite SET-SET axiomatization known for this logic provided in that same 
example). 


We constructed above a X-nd-B-matrix from two X-nd-matrices in such a 
way that the one-dimensional logics determined by latter are fully recoverable 
from the former. We formalize this construction below: 


Definition 6. Let M := (A, D) and M’ := (A, D’) be X-nd-matrices. The B- 
product between M and M' is the X'-nd-B-matriz M © M’ := (A, D, D’). 


Note that ® >y Y iff z| Z Mow iff 8 MOM y, and Ob» Y iff |z Mom’ 
iff @ MoM, W. Therefore, >m and >yy, are easily recoverable from i|: MoM’, 
since they inhabit, respectively, the t-aspect and the f-aspect of the latter. One 
of the applications of this novel way of putting two distinct logics together 
was illustrated in that same Example 2 to produce a two-dimensional analytic 
and finite axiomatization for a one-dimensional logic characterized by a X-nd- 
matrix. As we have shown, the latter one-dimensional logic does not need to be 
finitely axiomatizable by a SET-SET H-system. We present this application of 
B-products with more generality below: 


Proposition 2. Let M := (A, D) be a X-nd-matrix and suppose that U C Ax A 
contains all and only the pairs of distinct truth-values that fail to be separable in 
M. If, for some M’ := (A, D’), the pairs in U are separable in M', then MOM! is 
sufficiently expressive (thus, axiomatizable by an analytic Set?-SET? H-system, 
that is finite whenever A is finite). 
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6 A Finite and Analytic Proof System for mCi 


In the spirit of Proposition 2, we define below a nd-B-matrix by combining 
the matrices Mmci := (As, Y5) and Mig; := (As, Ns) introduced in Sect. 4 
(Definition 1 and Definition 4): 


Definition 7. Let Mmci := Mmci© Mig; = (As, Ys, N5), with Ys := {I,T, t} 
and Ns := {f,I, T}. 


When we consider now both sets Y5 and Ns of designated and antidesignated 
truth-values, the separation of all truth-values of A; becomes possible, that is, 
Mamci is sufficiently expressive, as guaranteed by Proposition 2. Furthermore, 
notice that we have two alternatives for separating the pairs (J,t) and (J,T): 
either using the formula ~p or the formula op. With this finite sufficiently expres- 
sive nd-B-matrix in hand, producing a finite {p, op}-analytic two-dimensional H- 
system for it is immediate by [17, Theorem 2]. Since mCi inhabits the t-aspect 
of -|+ Mmci, we may then conclude that: 


Theorem 5. mCi is aziomatizable by a finite and analytic two-dimensional 
H-system. 


Our axiomatization recipe delivers an H-system with about 300 rule schemas. 
When we simplify it using the streamlining procedures indicated in that paper, 
we obtain a much more succinct and insightful presentation, with 28 rule 
schemas, which we call Amci. The full presentation of this system is given below: 


q || -mci Il „mci P24? |l 5 mci P 5 moi PD 4, o(p D9) | p24 5 moi 
1 2 3 4 5 
pDq |l p pDa |l q |l q || pDq I 
paq |l amci p^a |l apei pq || naci | amci p^q, o(p^a) || PAG „mci 
p^a ll p il q |l p^allp^gq I 
r Vv n ` y ” 
P Il yc E l ymei PV 4 Il ymai ymci pVq, o(pVa) || PVA moi 
pVaq |l pVa ll p,q |l pall pva I 
op || mci ll omci || op mci ll mci Il omCi 
lop * cop || ? opl 3 op |} p t plop” 
ll mCi ap, op, p || mCi “p, p „mCi onp || =p, p mCi 
| =p,p + |? p 3 I $ 
| =P; P mci | _ mci | _ mci l| mci 
=p || 5 apop] ê  ~=ppl 7 onmp|ip è 


Note that the set of rules {©™% | © € {A,V, D}, i € {1,2,3}} makes it 
clear that the t-aspect of the induced B-consequence relation is inhabited by a 
logic extending positive classical logic, while the remaining rules for these con- 
nectives involve interactions between the two dimensions. Also, rule aC} indi- 
cates that o satisfies one of the main conditions for being taken as a consistency 
connective in the logic inhabiting the t-aspect. In fact, all these observations 
are aligned with the fact that the logic inhabiting the t-aspect of |: Rmci is 
precisely mCi. See, in Fig. 3, Rmci-derivations showing that, in mCi, sop and 
p/A-p are logically equivalent and that o-op is a theorem. 
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p^ =p || oath sop || mCi @ | o 
i 76 
ARO Va ol a N 
p || mer I op || °=0p || mci || 7°P 
Amc 5 vA we omci x X 
on0p op 
ac? ll pl lop oop l a 
TL \ a ope] pet 5 
sop || 
=op] op || pA P| op || i mci 
; 3 
ame op op || 
* oop || omci 
a cs oop || 
* amci 


Fig. 3. Rmci-derivations showing, respectively, that ae | SP Rmci » = one Rmci 


and Z] z Rmci - Note that, for a cleaner presentation, we omit the formulas inherited 
from parent nodes. 


on0p 


7 Concluding Remarks 


In this work, we introduced a mechanism for combining two non-deterministic 
logical matrices into a non-deterministic B-matrix, creating the possibility of pro- 
ducing finite and analytic two-dimensional axiomatizations for one-dimensional 
logics that may fail to be finitely axiomatizable in terms of one-dimensional 
Hilbert-style systems. It is worth mentioning that, as proved in [17], one may 
perform proof search and countermodel search over the resulting two-dimensional 
systems in time at most exponential on the size of the B-statement of interest 
through a straightforward proof-search algorithm. 

We illustrated the above-mentioned combination mechanism with two exam- 
ples, one of them corresponding to a well-known logic of formal inconsistency 
called mCi. We ended up proving not only that this logic is not finitely axiom- 
atizable in one dimension, but also that it is the limit of a strictly increasing 
chain of LFIs extending the logic mbC. From the perspective of the study of B- 
consequence relations, these examples allow us to eliminate the suspicion that a 
two-dimensional H-system 9% may always be converted into SET-SET H-systems 
for the logics inhabiting the one-dimensional aspects of +|- without losing any 
desirable property (in this case, finiteness of the presentation). 

At first sight, the formalism of two-dimensional H-systems may be confused 
with the formalism of n-sided sequents [3,4], in which the objects manipulated 
by rules of inference (the so-called n-sequents) accommodate more than two sets 
of formulas in their structures. The reader interested in a comparison between 
these two different approaches is referred to the concluding remarks of [17]. 

We close with some observations regarding INmec;i and the two-dimensional 
H-system Rmci. A one-dimensional logic > is said to be —-consistent when 
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p, 7p> Ø and --determined when SP y, 7¢ for all y € Ly(P). A B-consequence 
relation -|+ is said to allow for gappy reasoning when ate and to allow for glutty 
reasoning when ©X*¥, for some y € Ly(P). Notice that —-determinedness in the 
logic inhabiting the t-aspect of a B-consequence relation by no means implies 
the disallowance of gappy reasoning in the two-dimensional setting: we still have 
F € Y5ANs, so one may both non-accept and non-reject a formula vy in +|: %mei, 
even though non-accepting both y and its negation in mCi is not possible, in 
view of rule =™Ci. Similarly, the recovery of —-consistency achieved via o in 
such logic does not coincide with the gentle disallowance of glutty reasoning in 


|: Rmci, that is, we do not have, in general, pele Rmci Or Slee Rmci, even 
though for binary compounds both are derivable in view of rules ©@“, for 


© € {A,V, D}, and oPCi, With these observations we hope to call attention to 
the fact that B-consequence relations open the doors for further developments 
concerning the study of paraconsistency (and, dually, of paracompleteness), as 
well as the study of recovery operators [8]. 
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Abstract. Treating a saturation-based automatic theorem prover 
(ATP) as a Las Vegas randomized algorithm is a way to illuminate the 
chaotic nature of proof search and make it amenable to study by prob- 
abilistic tools. On a series of experiments with the ATP Vampire, the 
paper showcases some implications of this perspective for prover evalua- 
tion. 


Keywords: Saturation-based proving - Evalutation - Randomization 


1 Introduction 


Saturation-based proof search is known to be fragile. Even seemingly insignificant 
changes in the search procedure, such as shuffling the order in which input 
formulas are presented to the prover, can have a huge impact on the prover’s 
running time and thus on the ability to find a proof within a given time limit. 

This chaotic aspect of the prover behaviour is relatively poorly understood, 
yet has obvious consequences for evaluation. A typical experimental evaluation 
of a new technique T compares the number of problems solved by a baseline 
run with a run enhanced by T (over an established benchmark and with a fixed 
timeout). While a higher number of problems solved by the run enhanced by 
T indicates a benefit of the new technique, it is hard to claim that a certain 
problem P is getting solved thanks to T. It might be that T just helps the 
prover get lucky on P by a complicated chain of cause and effect not related to 
the technique T—and the original idea behind it—in any reasonable sense. 

We propose to expose and counter the effect of chaotic behaviours by delib- 
erately injecting randomness into the prover and observing the results of many 
independently seeded runs. Although computationally more costly than stan- 
dard evaluation, such an approach promises to bring new insights. We gain the 
ability to apply the tools of probability theory and statistics to analyze the 
results, assign confidences, and single out those problems that robustly benefit 
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from the evaluated technique. At the same time, by observing the changes in 
the corresponding runtime distributions we can even meaningfully establish the 
effect of the new technique on a single problem in isolation, something that is 
normally inconclusive due to the threat of chaotic fluctuations. 

In this paper, we report on several experiments with a randomized version 
of the ATP Vampire [9]. After explaining the method in more detail (Sect. 2), 
we first demonstrate the extent in which the success of a typical Vampire 
proof search strategy can be ascribed to chance (Sect. 3). Next, we use the col- 
lected data to highlight the specifics of comparing two strategies probabilisti- 
cally (Sect. 4). Finally, we focus on a single problem to see a chaotic behaviour 
smoothened into a distribution with a high variance (Sect.5). The paper ends 
with an overview of related work (Sect. 6) and a discussion (Sect. 7). 


2 Randomizing Out Chaos 


Any developer of a saturation-based prover will confirm that the behaviour of a 
specific proving strategy on a specific problem is extremely hard to predict, that 
a typical experimental evaluation of a new technique (such as the one described 
earlier) invariably leads to both gains and losses in terms of the solved problems, 
and that a closer look at any of the “lost” problems often reveals just a com- 
plicated chain of cause and effect that steers the prover away from the original 
path (rather than a simple opportunity to improve the technique further). 
These observations bring indirect evidence that the prover’s behaviour is 
chaotic: A specific prover run can be likened to a single bead falling down through 
the pegs of the famous Galton board!. The bead follows a deterministic trajec- 
tory, but only because the code fixes every single detail of the execution, includ- 
ing many which the programmer did not care about and which were left as they 
are merely out of coincidence. We put forward here that any such fixed detail 
(which does not contribute to an officially implemented heuristic) represents a 
candidate location for randomization, since a different programmer could have 
fixed the detail differently and we would still call the code essentially the same. 


Implementation: We implemented randomization on top of Vampire version 
4.6.1; the code is available as a separate git branch?. We divided the randomiza- 
tion opportunities into three groups (governed by three new Vampire options). 

Shuffling the input (-si on) randomly reorders the input formulas and, 
recursively, sub-formulas under commutative logical operations. This is done 
several times throughout the preprocessing pipeline, at the end of which a fin- 
ished clause normal form is produced. Randomizing traversals (-rtra on) hap- 
pens during saturation and consists of several randomized reorderings including: 
reordering literals in a newly generated clause and in each given clause before 
activation, and shuffling the order in which generated clauses are put into the 


1 https: //en.wikipedia.org/wiki/Galton_board. 
2 https: //github.com/vprover/vampire/tree/randire. 
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Fig. 1. Blue: first-order TPTP problems ordered by the decreasing probability of being 
solved by the dis10 strategy within 50 billion instruction limit. Red: a cactus plot for 
the same strategy, showing the dependence between a given instruction budget (y-axis) 
and the number of problems on average solved within that budget (x-axis). (Color figure 
online) 


passive set. It also (partially) randomizes term ids, which are used as tiebreak- 
ers in various term indexing operations and determine the default orientation of 
equational literals in the term sharing structure. Finally, “randomized age-weight 
ratio” (-rawr on) swaps the default, deterministic mechanism for choosing the 
next queue to select the given clause from [13] for a randomized one (which only 
respects the age-weight ratio probabilistically). 

All the three options were active by default during our experiments. 


3 Experiment 1: A Single-Strategy View 


First, we set out to establish to what degree the performance of a Vampire 
strategy can be affected by randomization. We chose the default strategy of the 
prover except for the saturation algorithm, which we set to Discount, and the 
age-weight ratio, set to 1:10 ( calling the strategy dis10). We ran our experiment 
on the first-order problems from the TPTP library [15] version 7.5.0°. 

To collect our data, we repeatedly (with different seeds) ran the prover on 
the problems, performing full randomization. We measured the executed instruc- 
tions* needed to successfully solve a problem and used a limit of 50 billion 
instructions (which roughly corresponds to 15 s of running time on our machine?) 
after which a run was declared unsuccessful. We ran the prover 10 times on each 
problem and additionally as many times as required to observe the instruction 
count average (over both successful and unsuccessful runs) stabilize within 1% 
from any of its 10 previously recorded values®. 

A summary view of the experiment is given by Fig. 1. The most important to 
notice is the shaded region there, which spans 965 problems that were solved by 


3 Materials accompanying the experiments can be found at https://bit.ly/3JDCwea. 
4 As measured via the perf_event_open Linux performance monitoring feature. 

5 A server with Intel(R) Xeon(R) Gold 6140 CPUs @ 2.3 GHz and 500 GB RAM. 

6 Utilizing all the 72 cores of our machine, such data collection took roughly 12h. 
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Fig. 2. The effect of turning AVATAR off in the dis10 strategy (cf. Figure 1). 


dis10 at least once but not by every run. In other words, these problems have 
probability p of being solved between 0 < p < 1. This is a relatively large number 
and can be compared to the 8720 “easy” problems solved by every run. The 
collected data implies that 9319.1 problems are being solved on average (marked 
by the left-most dashed line in Fig. 1) with a standard deviation o = 11.7. The 
latter should be an interesting indicator for prover developers: beating a baseline 
by only 12 TPTP problems can easily be ascribed just to chance. 

Figure | also contains the obligatory “cactus plot” (explained in the caption), 
which—thanks to the collected data—can be constructed with the “on average” 
qualifier. By definition, the plot reaches the left-most dashed line for the full 
instruction budged of 50 billion. The subsequent dashed lines mark the number 
of problems we would on average expect to solve by running the prover (indepen- 
dently) on each problem twice, three, four and five times. This is an information 
relevant for strategy scheduling: e.g., one can expect to solve whole additional 
137 problems by running randomized dis10 for a second time. 

Not every strategy exhibits the same degree of variability under randomiza- 
tion. Observe Fig. 2 with a plot analogous to Fig. 1, but for dis10 in which the 
AVATAR [16] has been turned off. The shaded area there is now much smaller 
(and only spans 448 problems). The powerful AVATAR architecture is getting 
convicted of making proof search more fragile and the prover less robust”. 


Remark. Randomization incurs a small but measurable computational over- 
head. On a single run of dis10 over the first-order TPTP (filtering out cases 
that took less than 1s to finish, to prevent distortion by rounding errors) the 
observed median relative time spent randomizing on a single problem was 0.47%, 
the average 0.59%, and the worse® 13.86%. Without randomization, the dis10 
strategy solved 9335 TPTP problems under the 50 billion instruction limit, i.e., 
16 problems more than the average reported above. Such is the price we pay for 
turning our prover into a Las Vegas randomized algorithm. 


T Another example of a strong but fragile heuristic is the lookahead literal selection 
[5], which selects literals in a clause based on the current content of the active set: 
dis10 enhanced with lookahead solves 9512.4 (+13.8) TPTP problems on average, 
8672 problems with p = 1 and additional 1382 (!) problems with 0 < p < 1. 

8 On the hard-to-parse, trivial-to-solve HWV094-1 with 361 199 clauses. 
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Fig. 3. Scatter plots comparing probabilities of solving a TPTP problem by the baseline 
dis10 strategy and 1) dis10 with AVATAR turned off (left), and 2) dis10 with blocked 
clause elimination turned on (right). On problems marked red the respective technique 
could not be applied (no splittable clauses derived / no blocked clauses eliminated). 


4 Experiment 2: Comparing Two Strategies 


Once randomized performance profiles of multiple strategies are collected, it is 
interesting to look at two at a time. Figure3 shows two very different scatter 
plots, each comparing our baseline dis10 to its modified version in terms of the 
probabilities of solving individual problems. 

On the left we see the effect of turning AVATAR off. The technique affects 
the proving landscape quite a lot and most problems have their mark along the 
edges of the plot where at least one of the two probabilities has the extreme 
value of either 0 or 1. What the plot does not show well, is how many marks end 
up at the extreme corners. These are: 7896 problems easy for both, 661 easy for 
AVATAR and hard without, 135 hard for AVATAR and easy without. 

Such “purified”, one-sided gains and losses constitute a new interesting indi- 
cator of the impact of a given technique. They should be the first to look at, 
e.g., during debugging, as they represent the most extreme but robust examples 
of how the new technique changes the capabilities of the prover. 

The right plot is an analogous view, but now at the effect of turning on blocked 
clause elimination (BCE). This is a preprocessing technique coming from the 
context of propositional satisfiability [7] extended to first-order logic [8]. We see 
that here most of the visible problems show up as marks along the plot’s main 
diagonal, suggesting a (mostly) negligible effect of the technique. The extreme 
corners hide: 8648 problems easy for both, 17 easy with BCE (11 satisfiable and 
6 unsatisfiable), and 2 easy without BCE (1 satisfiable and 1 unsatisfiable). 
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Fig. 4. 2D-histograms for the relative frequencies (color-scale) of how often, given a 
specific awr (x-axis), solving PR0017+2 required the shown number of instructions (y- 
axis). The curves in pink highlight the mean y-value for every x. The performance of 
dis10 (left) and the same strategy enhanced by a goal-directed heuristic (right). (Color 
figure online) 


5 Experiment 3: Looking at One Problem at a Time 


In their paper on age/weight shapes [13, Fig. 2], Rawson and Reger plot the 
number of given-clause loops required by Vampire to solve the TPTP problem 
PROO17+2 as a function of age/weight ratio (awr), a ratio specifying how often 
the prover selects the next clause to activate from its age-ordered and weight- 
ordered queues, respectively. The curve they obtain is quite “jiggly”, indicating 
a fragile (discontinuous) dependence. Randomization allows us to smoothen the 
picture and reveal new, until now hidden, (probabilistic) patterns. 

The 2D-histogram in Fig. 4 (left) was obtained from 100 independently seeded 
runs for each of 1200 distinct values of awr from between 1:1024 = 271° and 
4:1 = 2?. We can confirm Rawson and Reger’s observation of the best awr for 
PROO17+2 lying at around 1:2. However, we can now also attempt to explain the 
‘Jiggly-ness” of their curve: With a fragile proof search, even a slight change in 
awr effectively corresponds to an independent sample from the prover’s execution 
resource? distribution, which—although changing continuously with awr—is of 
a high variance for our problem (note the log-scale of the y-axis) t°. 

The distribution has another interesting property: At least for certain values 
of awr it is distinctly multi-modal. As if the prover can either find a proof quickly 
(after a lucky event?) or only after much harder effort later and almost nothing 
in between. Shedding more light on this phenomenon is left for further research. 

It is also very interesting to observe the change of such a 2D-histogram 
when we modify the proof search strategy. Figure4 (right) shows the effect of 
turning on SInE-level split queues [3], a goal directed clause selection heuristic 


? Rawson and Reger [13] counted given-clause loops, we measure instructions. 
10 Even with 100 samples for each value of awr, the mean instruction count (rendered 
in pink in Fig. 4) looks jiggly towards the weight-heavy end of the plot. 
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(Vampire option -slsq on). We can see that the mean instruction count gets 
worse (for every tried awr value) and also the variance of the distribution dis- 
tinctly increases. A curious effect of this is that we observe the shortest suc- 
cessful runs with -slsq on, while we still could not recommend (in the case of 
PROO17+2) this heuristic to the user. The probabilistic view makes us realize that 
there are competing criteria of prover performance for which one might want to 
optimize. 


6 Related Work 


The idea of randomizing a theorem prover is not new. Ertel [2] studied the 
speedup potential of running independently seeded instances of the connection 
prover SETHEO [10]. The dashed lines in our Figs. 1 and 2 capture an analogous 
notion in terms of “additional problems covered” for levels of parallelism 1—5. 
randoCoP [12] is a randomized version of another connection prover, leanCoP 2.0 
[11]: especially in its incomplete setup, several restarts with different seeds helped 
randoCoP improve over leanCoP in terms of the number of solved problems. 

Gomes et al. [4] notice that randomized complete backtracking algorithms for 
propositional satisfiability (SAT) lead to heavy-tailed runtime distributions on 
satisfiable instances. While we have not yet analyzed the runtime distributions 
coming from saturation-based first-order proof search in detail, we definitely 
observed high variance also for unsatisfiable problems. Also in the domain of 
SAT, Brglez et al. [1] proposed input shuffling as a way of turning solver’s runtime 
into a random variable and studied the corresponding distributions. 

An interesting view on the trade-offs between expected performance of a 
randomized solver and the risk associated with waiting for an especially long 
run to finish is given by Huberman et al. [6]. This is related to the last remark 
of the previous section. 

Finally, in the satisfiability modulo theories (SMT) community, input shuf- 
fling, or scrambling, has been discussed as an obfuscation measure in competi- 
tions [17], where it should prevent the solvers to simply look up a precomputed 
answer upon recognising a previously seen problem. Notable is also the use of 
randomization in solver debugging via fuzz testing [14,18]. 


7 Discussion 


As we have seen, the behaviour of a state-of-the-art saturation-based theorem 
prover is to a considerable degree chaotic and on many problems a mere per- 
turbation of seemingly unimportant execution details decides about the success 
or the failure of the corresponding run. While this may be seen as a sign of our 
as-of-yet imperfect grasp of the technology, the author believes that an equally 
plausible view is that some form of chaos is inherent and originates from the 
complexity of the theorem proving task itself. (A higher-order logic proof search 
is expected to exhibit an even higher degree of fragility.) 

This paper has proposed randomization as a key ingredient to a prover eval- 
uation method that takes the chaotic nature of proof search into account. The 
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extra cost required by the repeated runs, in itself not unreasonable to pay on con- 
temporary parallel hardware, seems more than compensated by the new insights 
coming from the probabilistic picture that emerges. Moreover, other uses of ran- 
domization are easy to imagine, such as data augmentation for machine learning 
approaches or the construction of more robust strategy schedules. It feels that 
we only scratched the surface of the opened-up possibilities. More research will 
be needed to fully harness the potential of this perspective. 
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Abstract. The long run behaviour of linear dynamical systems is often 
studied by looking at eventual properties of matrices and recurrences that 
underlie the system. A basic problem in this setting is as follows: given 
a set of pairs of rational weights and matrices {(w1, A1), . . . , (Wm, Am) }, 
does there exist an integer N s.t for all n > N, $; wi- A? > 0 (resp. 
> 0). We study this problem, its applications and its connections to linear 
recurrence sequences. Our first result is that for m > 2, the problem is 
as hard as the ultimate positivity of linear recurrences, a long standing 
open question (known to be coNP-hard). Our second result is that for any 
m > 1, the problem reduces to ultimate positivity of linear recurrences. 
This yields upper bounds for several subclasses of matrices by exploiting 
known results on linear recurrence sequences. Our third result is a general 
reduction technique for a large class of problems (including the above) 
from diagonalizable case to the case where the matrices are simple (have 
non-repeated eigenvalues). This immediately gives a decision procedure 
for our problem for diagonalizable matrices. 


Keywords: Eventual properties of matrices - Ultimate Positivity - 
linear recurrence sequences 


1 Introduction 


The study of eventual or asymptotic properties of discrete-time linear dynam- 
ical systems has long been of interest to both theoreticians and practitioners. 
Questions pertaining to (un)-decidability and/or computational complexity of 
predicting the long-term behaviour of such systems have been extensively stud- 
ied over the last few decades. Despite significant advances, however, there remain 
simple-to-state questions that have eluded answers so far. In this work, we inves- 
tigate one such problem, explore its significance and links with other known 
problems, and study its complexity and computability landscape. 


This work was partly supported by DST/CEFIPRA/INRIA Project EQuaVE and 
DST/SERB Matrices Grant MTR/2018/000744. 

Author names are in alphabetical order of last names. 

© The Author(s) 2022 


J. Blanchette et al. (Eds.): IJCAR 2022, LNAI 13385, pp. 671-690, 2022. 
https://doi.org/10.1007/978-3-031-10769-6_39 


672 S. Akshay et al. 


The time-evolution of linear dynamical systems is often modeled using lin- 
ear recurrence sequences, or using sequences of powers of matrices. Asymptotic 
properties of powers of matrices are therefore of central interest in the study of 
linear differential systems, dynamic control theory, analysis of linear loop pro- 
grams etc. (see e.g. [26,32,36,37]). The literature contains a rich body of work 
on the decidability and/or computational complexity of problems related to the 
long-term behaviour of such systems (see, e.g. [15,19,27,29,36,37]). A question 
of significant interest in this context is whether the powers of a given matrix of 
rational numbers eventually have only non-negative (resp. positive) entries. Such 
matrices, also called eventually non-negative (resp. eventually positive) matri- 
ces, enjoy beautiful algebraic properties ([13,16,25,38]), and have been studied 
by mathematicians, control theorists and computer scientists, among others. 
For example, the work of [26] investigates reachability and holdability of non- 
negative states for linear differential systems — a problem in which eventually 
non-negative matrices play a central role. Similarly, eventual non-negativity (or 
positivity) of a matrix modeling a linear dynamical system makes it possible 
to apply the elegant Perron-Frobenius theory [24,34] to analyze the long-term 
behaviour of the system beyond an initial number of time steps. Another level of 
complexity is added if the dynamics is controlled by a set of matrices rather than 
a single one. For instance, each matrix may model a mode of the linear dynami- 
cal system [23]. In a partial observation setting [22,39], we may not know which 
mode the system has been started in, and hence have to reason about eventual 
properties of this multi-modal system. This reduces to analyzing the sum of 
powers of the per-mode matrices, as we will see. 

Motivated by the above considerations, we study the problem of determining 
whether a given matrix of rationals is eventually non-negative or eventually 
positive and also a generalized version of this problem, wherein we ask if the 
weighted sum of powers of a given set of matrices of rationals is eventually 
non-negative (resp. positive). Let us formalize the general problem statement. 
Given a set A = {(w1, A1),...(Wm,Am)}, where each w; is a rational 
number and each A; is a kxk matriz of rationals, we wish to determine 
if So" wi A? has only non-negative (resp. positive) entries for all 
sufficiently large values of n. We call this problem Eventually Non-Negative 
(resp. Positive) Weighted Sum of Matriz Powers problem, or ENNsom (resp. 
EPs om) for short. The eventual non-negativity (resp. positivity) of powers of a 
single matrix is a special case of the above problem, where 2% = {(1, A)}. We call 
this special case the Eventually Non-Negative (resp. Positive) Matrix problem, 
or ENNmat (resp. EPmat) for short. 

Given the simplicity of the ENNs ow and EPsom problem statements, one may 
be tempted to think that there ought to be simple algebraic characterizations 
that tell us whether 57)", w; - A? is eventually non-negative or positive. But 
in fact, the landscape is significantly nuanced. On one hand, a solution to the 
general ENNsom or EPsom problem would resolve long-standing open questions 
in mathematics and computer science. On the other hand, efficient algorithms 
can indeed be obtained under certain well-motivated conditions. This paper is a 
study of both these aspects of the problem. Our primary contributions can be 
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summarized as follows. Below, we use 2% = { (w1, A1),...(Wm,Am)} to define an 
instance of ENNsom or EPsom. 


1. If || > 2, we show that both ENNsom and EPsom are as hard as the ultimate 

non-negativity problem for linear recurrence sequences (UNN ps, for short). 
The decidability of UNN Rs is closely related to Diophantine approximations, 
and remains unresolved despite extensive research (see e.g. [31]). 
Since UNN irs is coNP-hard (in fact, as hard as the decision problem for 
universal theory of reals), so is ENNsom and EPsom, when |X| > 2. Thus, 
unless P = NP, we cannot hope for polynomial-time algorithms, and any 
algorithm would also resolve long-standing open problems. 

2. On the other hand, regardless of |X|, we show a reduction in the other direc- 
tion from ENNgom (resp. EPsom) to UNNirs (resp. UPirs, the strict version 
of UNNips). As a consequence, we get decidability and complexity bounds for 
special cases of ENNs oy and EPs om, by exploiting recent results on recurrence 
sequences [30,31,35]. For example, if each matrix A; in 2 is simple, i.e. has 
all distinct eigenvalues, we obtain PSPACE algorithms. 

3. Finally, we consider the case where A; is diagonalizable (also called non- 
defective or inhomogenous dilation map) for each (w;,A;) € A. This is a 
practically useful class of matrices and strictly subsumes simple matrices. We 
present a novel reduction technique for a large family of problems (includ- 
ing eventual non-negativity/positivity, everywhere non-negativity /positivity 
etc.) over diagonalizable matrices to the corresponding problem over simple 
matrices. This yields effective decision procedures for EPsom and ENNsom for 
diagonalizable matrices. Our reduction makes use of a novel perturbation 
analysis that also has other interesting consequences. 


As mentioned earlier, the eventual non-negativity and positivity problem for 
single rational matrices are well-motivated in the literature, and EPmat (or EPsom 
with |X| = 1) is known to be in PTIME [25]. But for ENNmat, no decidability 
results are known to the best of our knowledge. From our work, we obtain two 
new results about ENNmat: (i) in general ENNmat reduces to UNNips and (ii) for 
diagonalizable matrices, we can decide ENNmat- What is surprising (see Sect. 5) 
is that the latter decidability result goes via ENNsom, i.e. the multiple matrices 
case. Thus, reasoning about sums of powers of matrices, viz. ENNsom, is useful 
even when reasoning about powers of a single matrix, viz. ENNmat. 


Potential Applications of ENNsom and EPsom. A prime motivation for defin- 
ing the generalized problem statement ENNsom is that it is useful even when 
reasoning about the single matrix case ENNmat- However and unsurprisingly, 
ENNsom and EPs om are also well-motivated independently. Indeed, for every 
application involving a linear dynamical system that reduces to ENNmat/EPmat, 
there is a naturally defined aggregated version of the application involving multi- 
ple independent linear dynamical systems that reduces to ENNsom/EPsom (e.g. 
the swarm of robots example in [3]). 

Beyond this, ENNsoum/EPsom arise naturally and directly when solving prob- 
lems in different practical scenarios. Due to lack of space, we detail two applica- 
tions here and describe more in the longer version of the paper [3]. 
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Partially Observable Multi-modal Systems. Our first example comes from 
the domain of cyber-physical systems in a partially observable setting. Consider 
a system (e.g. a robot) with m modes of operation, where the it” mode dynamics 
is given by a linear transformation encoded as a k x k matrix of rationals, say Aj. 
Thus, if the system state at (discrete) time t is represented by a k-dimensional 
rational (row) vector ut, the state at time t+ 1, when operating in mode i, is 
given by uz A;. Suppose the system chooses to operate in one of its various modes 
at time 0, and then sticks to this mode at all subsequent time. Further, the initial 
choice of mode is not observable, and we are only given a probability distribution 
over modes for the initial choice. This is natural, for instance, if our robot (multi- 
modal system) knows the terrain map and can make an initial choice of which 
path (mode) to take, but cannot change its path once it has chosen. If p; is a 
rational number denoting the probability of choosing mode i initially, then the 
expected state at time n is given by J> ;-; Pi © uoA? = uo( do, Pi: AP). A 
safety question in this context is whether starting from a state ug with all non- 
negative (resp. positive) components, the system is expected to eventually stay 
locked in states that have all non-negative (resp. positive) components. In other 
words, does uo(>>;", pi - A”) have all non-negative (resp. positive) entries for 
all sufficiently large n? Clearly, a sufficient condition for an affirmative answer 
to this question is to have )>;"_, pi> A? eventually non-negative (resp. positive), 
which is an instance of ENNsom (resp. EPsom). 


Commodity Flow Networks. Consider a flow network where m different 
commodities {c1,...,Cm} use the same flow infrastructure spanning k nodes, 
but have different loss/regeneration rates along different links. For every pair 
of nodes i,j € {1,...,k} and for every commodity c € {c1,...,¢m}, suppose 
A,|t, j| gives the fraction of the flow of commodity c starting from i that reaches 
j through the link connecting i and j (if it exists). In general, Aeļi, j] is the 
product of the fraction of the flow of commodity c starting at 7 that is sent along 
the link to j, and the loss/regeneration rate of c as it flows in the link from i to 
j. Note that Acli, j] can be 0 if commodity c is never sent directly from i to j, or 
the commodity is lost or destroyed in flowing along the link from 7 to j. It can be 
shown that A”{i, j] gives the fraction of the flow of c starting from i that reaches 
j after n hops through the network. If commodities keep circulating through the 
network ad-infinitum, we wish to find if the network gets saturated, i.e., for all 
sufficiently long enough hops through the network, there is a non-zero fraction 
of some commodity that flows from 7 to j for every pair i,j. This is equivalent 
to asking if there exists N € N such that $`; A? > 0. If different commodities 
have different weights (or costs) associated, with commodity c; having the weight 
wi, the above formulation asks if }>/", we.A%, is eventually positive, which is 
effectively the EPsom problem. 


Other Related Work. Our problems of interest are different from other well- 
studied problems that arise if the system is allowed to choose its mode inde- 
pendently at each time step (e.g. as in Markov decision processes [5,21]). The 
crucial difference stems from the fact that we require that the mode be chosen 
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once initially, and subsequently, the system must follow the same mode for- 
ever. Thus, our problems are prima facie different from those related to general 
probabilistic or weighted finite automata, where reachability of states and ques- 
tions pertaining to long-run behaviour are either known to be undecidable or 
have remained open for long ([6,12,17]). Even in the case of unary probabilis- 
tic/weighted finite automata [1,4,8,11], reachability is known in general to be 
as hard as the Skolem problem on linear recurrences — a long-standing open 
problem, with decidability only known in very restricted cases. The difference 
sometimes manifests itself in the simplicity/hardness of solutions. For example, 
EPmat (or EPsom with |X] = 1) is known to be in PTIME [25] (not so for ENNmat 
however), whereas it is still open whether the reachability problem for unary 
probabilistic/weighted automata is decidable. It is also worth remarking that 
instead of the sum of powers of matrices, if we considered the product of their 
powers, we would effectively be solving problems akin to the mortality problem 
[9,10] (which asks whether the all-O matrix can be reached by multiplying with 
repetition from a set of matrices) — a notoriously difficult problem. The diago- 
nalizable matrix restriction is a common feature in in the context of linear loop 
programs (see, e.g., [7,28]), where matrices are used for updates. Finally, logics 
to reason about temporal properties of linear loops have been studied, although 
decidability is known only in restricted settings, e.g. when each predicate defines 
a semi-algebraic set contained in some 3-dimensional subspace, or has intrinsic 
dimension 1 [20]. 


2 Preliminaries 


The symbols Q,R, A and C denote the set of rational, real, algebraic and com- 
plex numbers respectively. Recall that an algebraic number is a root of a non-zero 
polynomial in one variable with rational coefficients. An algebraic number can 
be real or complex. We use RA to denote the set of real algebraic numbers (which 
includes all rationals). The sum, difference and product of two (real) algebraic 
numbers is again (real) algebraic. Furthermore, every root of a polynomial equa- 
tion with (real) algebraic coefficients is again (real) algebraic. We call matrices 
with all rational (resp. real algebraic or real) entries rational (resp. real algebraic 
or real) matrices. We use A € Q**! (resp. A € R**! and A € RA**') to denote 
that A is a k x l rational (resp. real and real algebraic) matrix, with rows indexed 
1 through k, and columns indexed 1 through l. The entry in the it” row and j*” 
column of a matrix A is denoted Ali, j]. If A is a column vector (i.e. l = 1), 
we often use boldface letters, viz. A, to refer to it. In such cases, we use Afi] 
to denote the it” component of A, i.e. A[i,1]. The transpose of a k x | matrix 
A, denoted AT, is the | x k matrix obtained by letting A'{i, j] = Al[j, i] for all 
i € {1,...U} and j € {1,...k}. Matrix A is said to be non-negative (resp. posi- 
tive) if all entries of A are non-negative (resp. positive) real numbers. Given a set 
A = {(wi, Ai),-..(Wm,4m)} of (weight, matrix) pairs, where each A; € Q*** 
(resp. € RA***) and each w; € Q, we use )> A” to denote the weighted matrix 
sum )>;", wi- A”, for every natural number n > 0. Note that X A” is itself a 
matrix in Q*** (resp. RA***). 
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Definition 1. We say that A is eventually non-negative (resp. positive) iff there 
is a positive integer N s.t., X U” is non-negative (resp. positive) for alln > N. 


The ENNsom (resp. EPsom) problem, described in Sect. 1, can now be re-phrased 
as: Given a set A of pairs of rational weights and rational k x k matrices, is 2 
eventually non-negative (resp. positive)? As mentioned in Sect. 1, if A = {(1, A)}, 
the ENNsom (resp. EPsom) problem is also called ENNmat (resp. EPmat). We note 
that the study of ENNsom and EPsom with |A| = 1 is effectively the study of 
ENNmat and EPmaz i.e., wlog we can assume w = 1. 

The characteristic polynomial of a matrix A € RA*** is given by det(A—XI), 
where I denotes the k x k identity matrix. Note that this is a degree k polynomial 
in A. The roots of the characteristic polynomial are called the eigenvalues of A. 
The non-zero vector solution of the equation Ax = »;x, where A; is an eigenvalue 
of A, is called an eigenvector of A. Although A € RA***, in general it can 
have eigenvalues A € C which are all algebraic numbers. An eigenvector is said 
to be positive (resp. non-negative) if each component of the eigenvector is a 
positive (resp. non-negative) rational number. A matrix is called simple if all 
its eigenvalues are distinct. Further, a matrix A is called diagonalizable if there 
exists an invertible matrix S and diagonal matrix D such that SDS~! = A. 

The study of weighted sum of powers of matrices is intimately related to the 
study of linear recurrence sequences (LRS), as we shall see. We now present some 
definitions and useful properties of LRS. For more details on LRS, the reader is 
referred to the work of Everest et al. [14]. A sequence of rational numbers (u) 
= (Un)% 9 is called an LRS of order k (> 0) if the nt” term of the sequence, 
for all n > k, can be expressed using the recurrence: Un = ak—-1Un—-1 +... + 
Q1Un—k—-1 + GoUn—k. Here, ap (4 0), a1,...,an-1 E€ Q are called the coefficients 
of the LRS, and uo, u1,...,Uķ—1 E Q are called the initial values of the LRS. 
Given the coefficients and initial values, an LRS is uniquely defined. However, 
the same LRS may be defined by multiple sets of coefficients and corresponding 
initial values. An LRS (u) is said to be periodic with period p if it can be 
defined by the recurrence u, = un—, for all n > p. Given an LRS (u), its 
i= 
characteristic polynomial as Piu, (£) = Mil — àj)” , where À; is a root, called 
a characteristic root of algebraic multiplicity pj. An LRS is called simple if 
p; = 1 for all j, i.e. all characteristic roots are distinct. Let {A1, À2,..., Àa} 
be distinct roots of Pu) (x) with multiplicities p1, p2,- --, pa respectively. Then 


characteristic polynomial is Piu (x) = xë — $`; aja’. We can factorize the 


the nt” term of the LRS, denoted un, can be expressed as un = D qi(n)A?, 
where q;(x) € C(x) are univariate polynomials of degree at most pj — 1 with 
complex coefficients such that pe pj = k. This representation of an LRS is 
known as the exponential polynomial solution representation. It is well known 
that scaling an LRS by a constant gives another LRS, and the sum and product 
of two LRSs is also an LRS (Theorem 4.1 in [14]). Given an LRS (u) defined 
by Un = Gp—1Un—-1 + ... + G1Un—K—-1 + GoUn—zZ, We define its companion matrix 
Muy to be the k x k matrix shown in Fig. 1. 
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dia 141-00 When (u) is clear from the context, we often 

omit the subscript for clarity of notation, and use 

M, = : zi M for Miu). Let u = (up_i,..., Uo) be a row vec- 
tu) = | ag 0...10 tor containing the k initial values of the recurrence, 


a 0...01 and let ex = (0,0,...1)T be a column vector of k 
| ao 0...0 o] dimensions with the last element equal to 1 and the 
rest set to Os. It is easy to see that for all n > 1, 
Fig. 1. Companion matrix uM"”e, gives un. Note that the eigenvalues of the 
matrix M are exactly the roots of the characteristic 
polynomial of the LRS (u). 

0 u 
OT Mwy 
matrix of the LRS (u), where 0 is a k-dimensional vector of all 0s. We omit the 
subscript and use G instead of Gu), when the LRS (u) is clear from the context. 
It is easy to show from the above that un = G"*1[1,k + 1] for all n > 0. 

We say that an LRS (u) is ultimately non-negative (resp. ultimately posi- 
tive) iff there exists N > 0, such that Yn > N, un > 0 (resp. un > 0)!. The 
problem of determining whether a given LRS is ultimately non-negative (resp. 
ultimately positive) is called the Ultimate Non-negativity (resp. Ultimate Posi- 
tivity) problem for LRS. We use UNNips (resp. UP_rs) to refer to this problem. 
It is known [19] that UNNirs and UPirs are polynomially inter-reducible, and 
these problems have been widely studied in the literature (e.g., [27,31,32]). A 
closely related problem is the Skolem problem, wherein we are given an LRS 
(u) and we are required to determine if there exists n > 0 such that un = 0. 
The relation between the Skolem problem and UNN rs (resp. UPirs) has been 
extensively studied in the literature (e.g., [18,19,33]). 


For u = (uz-1,.--, Uo), we call the matrix Giu) = | | the generator 


3 Hardness of Eventual Non-negativity and Positivity 


In this section, we show that UNNirs (resp. UPLrs) polynomially reduces to 
ENNsom (resp. EPsom) when |X] > 2. Since UNNirs and UPirs are known to be 
coNP-hard (in fact, as hard as the decision problem for the universal theory of 
reals Theorem 5.3 [31]), we conclude that ENNsom and EPsom are also coNP-hard 
and at least as hard as the decision problem for the universal theory of reals, 
when |X| > 2. Thus, unless P = NP, there is no hope of finding polynomial-time 
solutions to these problems. 


Theorem 1. UNNirs reduces to ENNsom with |2| > 2 in polynomial time. 


Proof. Given an LRS (u) of order k defined by the recurrence un = @g—1Un—1 + 
... + G1Un—K-1 + AoUn—p and initial values uo,w1,...,Uz—1, construct two 


1 Ultimately non-negative (resp. ultimately positive) LRS, as defined by us, have also 
been called ultimately positive (resp. strictly positive) LRS elsewhere in the literature 
[31]. However, we choose to use terminology that is consistent across matrices and 
LRS, to avoid notational confusion. 
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matrices A; and Ag such that (u) is ultimately non-negative iff (A? + A?) is 


eventually non-negative. Consider A; = k a , the generator matrix of (u) 
and Ay = lor p , where P € Q*** is constructed such that : Pi, j] > |M[i, jll. 
For example P can be constructed as: Pli,j] = M[i, j] for all j € [2,k] and 
i € [1,k] and Pļi, j] = max(|aol, jail, ..-,|ar-1|) + 1 for j = 1. Now consider 
the sequence of matrices defined by AF + A}, for all n > 1. By properties of the 
n—-1 
generator matrix, it is easily verified that A? = o a | . Similarly, we get 
0 0 0 uM”! 
n = n n = > . 
AS lor prt: Therefore, A? + 43 lor pry "i for all n > 1. Now, we 


can observe that P” + M” is always non-negative, since Pļi, j] > |M[i, j]| > 0 
for all i,j € {1,...k} and hence P”[i, j] + M”ẹli, 7] > 0 for all i,j € {1,...k} 
and n > 1. Thus we conclude that A(n) = A? + A} > 0 (n > 1) iff (u) is 
ultimately non-negative, since the elements A(n)[1,1]..., A(n)[L, k + 1] consists 
of (Un+k—2.. ., Un, Un—1) and the rest of the elements are non-negative. 


Observe that the same reduction technique works if we are required to 
use more than 2 matrices in ENNsom. Indeed, we can construct matrices 
Ag3,A4,...,Am similar to the construction of Ag in the reduction above, by 
having the k x k matrix in the bottom right (see definition of Az) to have pos- 
itive values greater than the maximum absolute value of every element in the 
companion matrix. 


|i HF where 1 
denotes the k-dimensional vector of all 1’s gives us the corresponding hardness 
result for EPsom (see [3] for details). 


A simple modification of the above proof setting Ap = 


Theorem 2. UPirs reduces to EPsom with |A| > 2 in polynomial time. 


We remark that for the reduction technique used in Theorems 1 and 2 to 
work, we need at least two (weight, matrix) pairs in 2. For explanation of why 
this reduction doesn’t work when |A| = 1, we refer the reader to [3]. Having 
shown the hardness of ENNsom and EPsom when |A| > 2, we now proceed to 
establish upper bounds on the computational complexity of these problems. 


4 Upper Bounds on Eventual Non-negativity 
and Positivity 


In this section, we show that ENNsom (resp. EPsom) is polynomially reducible to 
UNNirs (resp. UPirs), regardless of |2l]. 


Theorem 3. ENNs uy, reduces to UNNiRs in polynomial time. 


The proof is in two parts. First, we show that for a single matrix A, we 
can construct a linear recurrence (a) such that A is eventually non-negative iff 
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(a) is ultimately non-negative. Then, we show that starting from such a linear 
recurrence for each matrix in 2, we can construct a new LRS, say (a*), with 
the property that the weighted sum of powers of the matrices in 2 is eventually 
non-negative iff (a*) is ultimately non-negative. Our proof makes crucial use of 
the following property of matrices. 


Lemma 1 Adapted from Lemma 1.1 of [19]). Let A € Q*** be a rational 
matrix with characteristic polynomial pa(A) = det(A — AI). Suppose we define 
the sequence (a) for every 1 < i,j < k as follows: at = A"*1[i, j], for all 
n > 0. Then (at) is an LRS of order k with characteristic polynomial p a(x) 
and initial values given by aj = Alli, j],...a2_, = A*[i, j]. 

This follows from the Cayley-Hamilton Theorem and the reader is referred to [19] 
for further details. From Lemma 1, it is easy to see that the LRS (at) for 
all 1 < i,j < k share the same order and characteristic polynomial (hence 
the defining recurrence) and differ only in their initial values. For notational 
convenience, we say that the LRS (a*t) is generated by Ali, jl. 


Proposition 1. A matriz A € Q*** is eventually non-negative iff all LRS (att) 
generated by Ali, j| for all 1 < i,j < k are ultimately non-negative. 


The proof follows from the definition of eventually non-negative matrices and 
the definition of (atf). Next we define the notion of interleaving of LRS. 


Definition 2. Consider a set S = {(u') : 0 < i < t} of t LRSes, each having 
order k and the same characteristic polynomial. An LRS (v) is said to be the 
LRS-interleaving of S iff Vints = uy, for alln EN and0<s <t. 


Observe that, the order of (v) is tk and its initial values are given by the 
interleaving of the k initial values of the LRSes (ut). Formally, the initial values 
are Uij4i = ul for0 <i < tand 0 < j < k. The characteristic polynomial p» (s) 
is equal to Piui} (x*). 


Proposition 2. The LRS-interleaving (v) of a set of LRSes S = { (uf) :0 < i < 
t} is ultimately non-negative iff each LRS (u') in S is ultimately non-negative. 


Now, from the definitions of LRSes (a*t), (ut) and (v), and from Proposi- 
tions 1 and 2, we obtain the following crucial lemma. 
Lemma 2. Given a matriz A € Q***, let S = {(u') | ul, = aP4, where p = 
li/k] +1, q=i mod k+1, 0 <i < k?} be the set of k? LRSes mentioned in 
Lemma 1. The LRS (v) generated by LRS-interleaving of S satisfies the following: 


1. A is eventually non-negative iff (v) is ultimately non-negative. 

2. Diy) (x) = Cae —;), where A1,...Ax are the (possibly repeated) eigen- 
values of A. 

3. Urkopskpt = USETE = gsthetl = Artl(s41,¢+1] for allr EN, 0<3,t<k. 


We lift this argument from a single matrix to a weighted sum of matrices. 
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Lemma 3. Given A = {(wi, A1),...,(Wm, Am)}, there exists a linear recur- 
rence (a*), such that >", w; A? is eventually non-negative iff (a*) is ultimately 
non-negative. 


Proof. For each matrix A; in 2, let (vt) be the interleaved LRS as constructed 
in Lemma 2. Let w;(v’) denote the scaled LRS whose nt” entry is wvf for all 
n > 0. The LRS (a*) is obtained by adding the scaled LRSes w1 (vt), wo(v?),... 
Wm(v™). Clearly, až is non-negative iff 57)", w;v!, is non-negative. From the 
definition of vê (see Lemma 2), we also know that for all n > 0, vi, = ATT [s + 
1,t + 1], where r = |n/k?|, s = |(n mod k?)/k| and t =n mod k. Therefore, 
až is non-negative iff X72] wA! [s +1,t+ 1] is non-negative. It follows that 
(a*) is ultimately non-negative iff X ;-; w;A" is eventually non-negative. 


From Lemma 3, we can conclude the main result of this section, i.e., proof 
of Theorem 3. The following corollary can be shown mutatis mutandis. 


Corollary 1. EPs om reduces to UPirs in polynomial time. 


We note that it is also possible to argue about the eventual non-negativity 
(positivity) of only certain indices of the matrix using a similar argument as 
above. By interleaving only the LRS’s corresponding to certain indices of the 
matrices in 2, we can show this problem’s equivalence with UNNirs (UP Rs). 


5 Decision Procedures for Special Cases 


Since there are no known algorithms for solving UNN pgs in general, the results 
of the previous section present a bleak picture for deciding ENNsom and EPsom. 
We now show that these problems can be solved in some important special cases. 


5.1 Simple Matrices and Matrices with Real Algebraic Eigenvalues 
Our first positive result follows from known results for special classes of LRSes. 


Theorem 4. ENNsom and EPsom are decidable for A = {(w1, A1), .. . (Wm, Am)} 
if one of the following conditions holds for alli € {1,... m}. 


1. All A; are simple. In this case, ENNsom and EPsom are in PSPACE. Addition- 
ally, if the rank k of all A; is fixed, ENNsom and EPsom are in PTIME. 
2. All eigenvalues of A; are roots of real algebraic numbers. In this case, ENNsom 


and EPsom are in coNPP°SSLP (a complexity class in the Counting Hierarchy, 
contained in PSPACE). 


Proof. Suppose each A; € Q***, and let Aji1,---di,~ be the (possibly repeated) 
eigenvalues of A;. The characteristic polynomial of A; is pa,(x) = Mi- (2 — 
Aij). Denote the LRS obtained from A; by LRS interleaving as in Lemma 2 
as (a'). By Lemma 2, we have (i) afp2}sk}t = A’ (gs +1,t+ 1] for all r € N 


and 0 < s,t < k, and (ii) Piai (£) = 4 (ak — i,j). We now define the 
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scaled LRS {(b'), where | b = w; af for all n € N. Since scaling does not 
change the characteristic polynomial of an LRS (refer [3] for a simple proof), 
we have pipi) (£) = Mi- (ak — \j,;). Once the LRSes (b'),...(b™) are obtained 
as above, we sum them to obtain the LRS (b*). Thus, for all n € N, we have 
be = 0 = Oe w at, = wi Al [s, t], where n = rk? + sk +t, r eN 
and 0 < s,t < k. Hence, ENNsom (resp. EPsom) for {(w1,A1),.--(Wm,Am)} 
polynomially reduces to UNNırs (resp. UPLrs) for (b*). 

By [14], we know that the characteristic polynomial pp) (x) is the LCM of 
the characteristic polynomials pyyi)(x) for 1 < i < m. If A; are simple, there 
are no repeated roots of ppi; (x). If this holds for all i € {1,...m}, there are no 
repeated roots of the LCM of pp) (£), .. -Pimy (x) as well. Hence, p; (x) has 
no repeated roots. Similarly, if all eigenvalues of A; are roots of real algebraic 
numbers, so are all roots of pyyiy(x). It follows that all roots of the LCM of 
P(b1)(@), +++ Pipmy (x), i.e. pos) (x), are also roots of real algebraic numbers. 

The theorem now follows from the following two known results about LRS. 


1. UNNirs (resp. UPirs) for simple LRS is in PSPACE. Furthermore, if the LRS 
is of bounded order, UNN ips (resp. UPirs) is in PTIME [31]. 

2. UNNirgs (resp. UPirs) for LRS in which all roots of characteristic polynomial 
are roots of real algebraic numbers is in coNPPosstP [9], 


Remark: The technique used in [31] to decide UNNirs (resp. UPirs) for simple 
rational LRS also works for simple LRS with real algebraic coefficients and initial 
values. This allows us to generalize Theorem 4(1) to the case where all A;’s and 
w;’s are real algebraic matrices and weights respectively. 


5.2 Diagonalizable Matrices 


We now ask if ENNsom and EPsom can be decided if each matrix A; is diagonal- 
izable. Since diagonalizable matrices strictly generalize simple matrices, Theo- 
rem 4(1) cannot answer this question directly, unless one perhaps looks under the 
hood of the (highly non-trivial) proof of decidability of non-negativity / positivity 
of simple LRSes. The main contribution of this section is a reduction that allows 
us to decide ENNs om and EPsom for diagonalizable matrices using a black-box 
decision procedure (i.e. without knowing operational details of the procedure 
or details of its proof of correctness) for the corresponding problem for simple 
real-algebraic matrices. 

Before we proceed further, let us consider an example of a non-simple matrix 
(i.e. one with repeated eigenvalues) that is diagonalizable. 


5 12 -6 Specifically, matrix A in Fig.2 has eigenval- 

A= |-3~-10 6 ues 2,2 and —1, and can be written as SDS7, 
-3—12 8 where D is the 3 x 3 diagonal matrix with 
D|1,1] = D[2,2] = 2 and D[3,3] = —1, and 


S is the 3 x 3 matrix with columns (—4,1, 0)", 
(2,0,1)' and (-1,1,1)'. 


Fig. 2. Diagonalizable matrix 
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Interestingly, the reduction technique we develop applies to properties much 
more general than ENNs oy and EPsom. Formally, given a sequence of matrices 
B,, defined by ye w,A?, we say that a property P of the sequence is positive 
scaling invariant if it stays unchanged even if we scale all A;s by the same positive 
real. Examples of such properties include ENNsom, EPsom, non-negativity and 
positivity of Bn (ie. is B,[i, j] > 0 or < 0, as the case may be, for all n > 1 and 
for all 1 < i,j < k), existence of zero (i.e. is Bn equal to the all 0-matrix for 
some n > 1), existence of a zero element (i.e. is Bafi, j] = 0 for some n > 1 and 
some i,j € {1,...k}), variants of the r-non-negativity (resp. r-positivity and 
r-zero) problem, i.e. does there exist at least/exactly/at most r non-negative 
(resp. positive/zero) elements in Bn for all n > 1, for a given r € [1, k]) etc. The 
main result of this section is a reduction for deciding such properties, formalized 
in the following theorem. 


Theorem 5. The decision problem for every positive scaling invariant property 
on rational diagonalizable matrices effectively reduces to the decision problem for 
the property on real algebraic simple matrices. 


While we defer the proof of this theorem to later in the section, an immediate 
consequence of Theorem 5 and Theorem 4(1) (read with the note at the end of 
Sect. 5.1) is the following result. 


Corollary 2. ENNsom and EPsom are decidable for A = {(w1, A1), ... 
(Wm, Am)} if all Ais are rational diagonalizable matrices and all wis are rational. 


It is important to note that Theorem 5 yields a decision procedure for checking 
any positive scaling invariant property of diagonalizable matrices from a corre- 
sponding decision procedure for real algebraic simple matrices without making 
any assumptions about the inner working of the latter decision procedure. Given 
any black-box decision procedure for checking any positive scaling property for 
a set of weighted simple matrices, our reduction tells us how a corresponding 
decision procedure for checking the same property for a set of weighted diago- 
nalizable matrices can be constructed. Interestingly, since diagonalizable matri- 
ces have an exponential form solution with constant coefficients for exponential 
terms, we can use an algorithm that exploits this specific property of the expo- 
nential form (like Ouaknine and Worrell’s algorithm [31], originally proposed for 
checking ultimate positivity of simple LRS) to deal with diagonalizable matrices. 
However, our reduction technique is neither specific to this algorithm nor does 
it rely on any special property the exponential form of the solution. 

The proof of Theorem 5 crucially relies on the notion of perturbation of 
diagonalizable matrices, which we introduce first. Let A be a k x k real diago- 
nalizable matrix. Then, there exists an invertible k x k matrix S and a diagonal 
k x k matrix D such that A = SDS~!, where S and D may have complex 
entries. It follows from basic linear algebra that for every i € {1,...k}, Dẹ[i, i] is 
an eigenvalue of A and if a is an eigenvalue of A with algebraic multiplicity p, 
then a appears exactly p times along the diagonal of D. Furthermore, for every 
i € {1,...k}, the i” column of S (resp. it row of 97t) is an eigenvector of 
A (resp. of A!) corresponding to the eigenvalue Dfi, i], and the columns of S 
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(resp. rows of S71) form a basis of the vector space C*. Let a1,...Qm be the 
eigenvalues of A with algebraic multiplicities p1,... Pm respectively. Wlog, we 
assume that pı >... > Pm and the diagonal of D is partitioned into segments 
as follows: the first pı entries along the diagonal are a , the next p2 entries are 
az, and so on. We refer to these segments as the a -segment, a2-segment and so 
on, of diagonal of D. Formally, if k; denotes DE Pj, the a;-segment of diagonal 
of D consists of entries D[k; +1, ki +1], ... D[ki + pi, Ki + pil, all of which are a;. 

Since A is a real matrix, its characteristic polynomial has all real coeffi- 
cients and for every eigenvalue a of A (and hence of A‘), its complex conjugate, 
denoted g, is also an eigenvalue of A (and hence of AT) with the same algebraic 
multiplicity. This allows us to define a bijection hp from {1,...,k} to {1,...k} 
as follows. If D[2, i] is real, then hp(i) = i. Otherwise, let D[i, i] = a € C and let 
Dji, i] be the lt” element in the a-segment of the diagonal of D. Then hp(i) = j, 
where D[j, j] is the I” element in the @segment of the diagonal of D. The 
matrix A being real also implies that for every real eigenvalue a of A (resp. of 
AT), there exists a basis of real eigenvectors of the corresponding eigenspace. 
Additionally, for every non-real eigenvalue a and for every set of eigenvectors 
of A (resp. of A‘) that forms a basis of the eigenspace corresponding to a, the 
component-wise complex conjugates of these basis vectors serve as eigenvectors 
of A (resp. of AT) and form a basis of the eigenspace corresponding to @. 

Using the above notation, we choose matrix S~! (and hence S) such that 
A= SDS"! as follows. Suppose a is an eigenvalue of A (and hence of A‘) with 
algebraic multiplicity p. Let {i + 1,...i + p} be the set of indices j for which 
D|j, j] = a. If æ is real (resp. complex), the i+ 1%',...i+ p rows of S~! are 
chosen to be real (resp. complex) eigenvectors of AT that form a basis of the 
eigenspace corresponding to a. Moreover, if œ is complex, the hp(i+ s)!” row 
of ST! is chosen to be the component-wise complex conjugate of the i + st” row 
of S7}, for all s € {1,... p}. 


Definition 3. Let A= SDS! be a kxk real diagonalizable matrix. We say that 
E = (€1,..-.€%) E R? is a perturbation w.r.t. D if ei #0 and £i = Enp(i) for all 
i € {1,...k}. Further, the €-perturbed variant of A is the matriz A' = SD' S7}, 
where D' is the kxk diagonal matrix with D'ẹ|i, i] = eiD[i, i] for alli € {1,... k}. 


In the following, we omit ”w.r.t. D” and simply say ”€ is a perturbation”, when 
D is clear from the context. Clearly, A’ as defined above is a diagonalizable 
matrix and its eigenvalues are given by the diagonal elements of D’. 

Recall that the diagonal of D is partitioned into a;-segments, where each a; is 
an eigenvalue of A = $DS~! with algebraic multiplicity p;. We now use a similar 
idea to segment a perturbation E w.r.t. D. Specifically, the first pı elements of 
E constitute the a,-segment of E, the next p2 elements of E constitute the a2- 
segment of € and so on. 


Definition 4. A perturbation E = (€1,...€,) is said to be segmented if the jt” 
element (whenever present) in every segment of E has the same value, for all 
1 < j < pı. Formally, if i = ya ps +j and1 <j <p <p, then £i = £j. 
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Clearly, the first pı elements of a segmented perturbation E define the whole 
of E. As an example, suppose (a1, 1, 01, Q2, Q2, @2, Az, a3) is the diagonal of D, 
where a1, @2, Q72 and a3 are distinct eigenvalues of A. There are four segments 
of the diagonal of D (and of £) of lengths 3, 2,2 and 1 respectively. 

Example segmented perturbations in this case are (€1, E2, €3, E1, €2, E1, E2; E1) 
and (€3,€1, €2,€3,€1,€3,€1,€3). If e1 Æ E2 or £2 Æ Es, a perturbation that is not 
segmented is E= (€1, E2, E3, E2, E3, E2, E3, E1). 


Definition 5. Given a segmented perturbation E = (€1,...€) w.r.t. D, a rota- 
tion of E, denoted Tp(E), is the segmented perturbation E' = (£1, ... €) in which 
EG mod pı)+1 = £i fori € {1,...pi}, and all other £s are as in Definition 4. 


Continuing with our example, if E = (€1,€2,€3,€1,€2,€1,€2,€1), then Tp(E) = 
2 B 3 an 
(€3, €1, €2, E3, £1, €3,€1,€3), TH(E) = (€2,€3, €1, E2, E3, €2,€3,€2) and T(E) = E. 


Lemma 4. Let A = SDS~' be ak x k real diagonalizable matrix with eigen- 
values a; of algebraic multiplicity pi. Let E = (€1,...€~) be a segmented per- 
turbation w.r.t. D such that all ejs have the same sign, and let A, denote 
the TH(E) ee variant of A for 0 < u < pı, where r°(E) = E. Then 
A = ‘Sar 1 At, for alln > 1. 


T Ga) 


Proof. Let E, denote 7%(E) for 0 < u < pi, and let €,,[i] denote the it” element of 
Eu for 1 < i < k. It follows from Definitions 4 and 5 that for each i, j € {1,... p1}, 
there is a unique u € {0,... p1 — 1} such that Eli] = £j. Specifically, u = i — j 
if i > j, and u = (pı — j) +i if i < j. Furthermore, Definition 4 ensures that the 
above property holds not only for i € {1,...pi}, but for alli € {1,...k}. 

Let D,, denote the diagonal matrix with D,,{i, i] = Euli] Di, i] for 0 < i < pı. 
Then D? is the diagonal matrix with D"{i,i] = (Euli D[i, i)” for all n > 1. 
It follows from the definition of A, that A? = S D} S-t for 0 <u <p 
and n > 1. Therefore, Duzo A? = S (Peny D2) S71. Now, Fo, De is 
a diagonal matrix whose it” element along the diagonal is 7?" 1 (Eu[é]D[i, iJ)" 

(ey ee Ep lt |) D” (i, i]. By us of the property mentioned in the previous 
paragraph, 7?!" E"[i] = fle} for 1 < i < k. Therefore, )0i"5 1p? = 
( fL et) D”, and hence, X2, o4 =( fu 67) SD" S7 = ( ie) A”. 


Since all ejs have the same sign ~ are non-zero, (J061; e? 


j=1 “j 
n > 1. It follows that A” = EA] A 
j= L Ej 


We are now in a position to present the proof of the main result of this 
section, i.e. of Theorem 5. Our proof uses a variation of the idea used in the 
proof of Lemma 4 above. 


Proof of Theorem 5. Consider a set {(w1, A1), ... (wi, Aj)} of (weight, matrix) 
pairs, where each matrix A; is in Q**” and each w; € Q. Suppose further that 
each A; = SiDiS; E where D; is a diagonal matrix with segments along the 
diagonal arranged in descending order of algebraic multiplicities of the corre- 
sponding eigenvalues. Let v; be the number of distinct eigenvalues of A;, and 


) is non-zero for all 
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let these eigenvalues be a;1,...aji,,,. Let u; be the largest algebraic multiplic- 
ity among those of all eigenvalues of A;, and let u = Iem(t1,... Um). We now 
choose positive rationals ¢,,...¢,, such that (i) all ejs are distinct, and (ii) for 
every i € {1,...m}, for every distinct j,l € {1,...v;} and for every distinct 
p,q € {1,... u}, we have F x lak Since Q is a dense set, such a choice of 


€1,-.-€, can always be made once all S |s are known, even if within finite 
precision bounds. l 

For 1 <i < m, let 7; denote u/i. We now define n; distinct and segmented 
perturbations w.r.t. D; as follows, and denote these as €;,1,...€i,),. For 1 < j < 
Ni, the first u; elements (i.e. the first segment) of E; j are €(j-1)y,415-+-Ejn; (as 
chosen in the previous paragraph), and all other elements of €;,; are defined as in 
Definition 4. For each E; j thus obtained, we also consider its rotations TH, (Ei,;) 
for O < u < py. For 1 < j < m and O < u < py, let Aiju = Si Diju S= 
denote the 77 (&,;)-perturbed variant of A;. It follows from Definition 3 that 
if we consider the set of diagonal matrices {Diju | 1 <j < m,0<u< mi}, 
then for every p € {1,...k} and for every q € {1,... u}, there is a unique u and 
j such that Di julp, p] = £q. Specifically, j = |¢/p;]. To find u, let €;,;[p] be 
the p’” element in a segment of E; j, where 1 < P < mi, and let be q mod p. 
Then, u = (P — q) if P > q and u = (ui — q) + P otherwise. By our choice of 
Es, we also know that for all i € {1,...mb}, for all j,l € {1,...vi} and for all 
p,q € {1,... u}, we have Epai 1 A €,ai,; unless p = q and j = l. This ensures that 
all D; ju matrices, and hence all A; jus matrices, are simple, i.e. have distinct 
eigenvalues. 

Using the reasoning in Lemma 4, we can now show that Ap = 


$ Ni Min n m An 1 
i= xX ( yy a Ata) and so, >, wiAP = = x 
JE] j=1 fj 
jia ~H : pix m H 
(Xa ip wA? u). Since all ejs are positive reals, $j €} is a pos- 


itive Gaal fos alln > 1. 

Hence, for each p,q € Ti ky}, X; wA? [p,q] is > 0, < 0 or = 0 if and 
only if (Sy oe 1h wA? ulp ql) is > 0, < 0 or = 0, respectively. The 
only romaine helper result that is now needed to complete the proof of the 
theorem is that each A; j,u is a real algebraic matrix. This is shown in Lemma 5, 
presented at the end of this section to minimally disturb the flow of arguments. 


The reduction in proof of Theorem 5 can be easily encoded as an algorithm, 
as shown in Algorithm 1. Further, in addition to Corollary 2, there are other 
consequences of our reduction. One such result (with proof in [3]) is below. 


Corollary 3. Given A = {(wi, A1), ... (Wm, Am)}, where each wi E€ Q and A; € 
Q*** is diagonalizable, and a real value € > 0, there exists B = {(v1, Bı), 
.. (vm, Bm)}, where each v; E€ Q and each B; € RA*** is simple, such that 


pau w A? [p,q] — D vj;B}[p,q]| < €” for all p,q € {1,...k} and all n > 1. 


We end this section with the promised helper result used at the end of the proof 
of Theorem 5. 
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Algorithm 1. Reduction procedure for diagonalizable matrices 


Input: %={(wi, Ai) : 1<i<m, wi € Q, A; € Q*** and diagonalizable} 
Output: B= {(v;,Bi): 1<i<t, vi €Q, B: € RA*** are simple} 
s.t. (OO, wiA?) = f(n) (Xi uiB?), where f(n) > 0 for all n > 0? 

: Pe {1}; > Initialize set of forbidden ratios of various ¢js 
: for i in 1 through m do > For each matrix A; 
Ri — {(04,3, Pij) : Qij is eigenvalue of A; with algebraic multiplicity p;,,;}; 
D; — Diagonal matrix of a;,;-segments ordered in decreasing order of pi,;; 
Si — Matrix of linearly independent eigenvectors of A; s.t. A; = S:DiS7'; 
PPU { jai; /oi1| © Qij, Qi are eigenvalues in Riy; Hi — Max; pi,j 


: y= lem(m,... Hm); > Count of ejs needed 
: for j in 1 through p do > Generate all required e;s 
9: Choose £; € Q s.t. ej >Oande; {rep : 1< p< j, 7 € P}; 

10: B — ; > Initialize set of (weight, simple matrix) pairs 
11: for i in 1 through m do > For each matrix A; 
12: Vi — Hf hi; >œ Count of segmented perturbations to be rotated for A; 
13: for j in 0 through v; — 1 do > For each segmented perturbation 
14: Eij <— Seg. perturbn. wrt. Di with first pi elements being 


E jug tls -+ + E(j+1)u) 
15: for u in 0 through u; — 1 do > For each rotation of €;,; 
16: Ai ju — Tp, (Ei,j)-perturbed variant of A; 
17: B— BU {(wi, Aij,u)}5 > Update 2’ 
18: return 8; 


Lemma 5. For every real (resp. real algebraic) diagonalizable matrix A = 
SDS! and perturbation E € R® (resp. RA*), the E-perturbed variant of A 
is a real (resp. real algebraic) diagonalizable matriz. 


Proof. We first consider the case of A € R*** and £ € R*. Given a perturbation 
E w.r.t. D, we first define k simple perturbations E; (1 < i < k) wrt. D as 
follows: €; has all its components set to 1, except for the i*” component, which 
is set to €;. Furthermore, if Dfi, i] is not rel, then the hp(i)t” component of £; 
is also set to €;. It is easy to see from Definition 3 that each €; is a perturbation 
w.r.t. D. Moreover, if j = hp(i), then €; = &. 

Let E = {€;,,...&,,} be the set of all unique perturbations w.r.t D among 
E1, ... Ep. It follows once again from Definition 3 that the £- perturbed variant of 
A can be obtained by a sequence of €;,-perturbations, where €;, € E. Specifically, 
let A, g = Aand A, ẹbe the €;, -perturbed variant of A,_, ¢ = for all v € {1,...u}. 
Than, fhe E- perturbed variant of A is identical to A, >. This shows that it es 
to prove the lemma only for simple perturbations oa as defined above. We focus 
on this special case below. 

Let A’ = SD'S~! be the E;-perturbed variant of A, and let DJi,7] = a. 
For every p € {1,...k}, let ep denote the p-dimensional unit vector whose p‘” 
component is 1. Then, A’ep gives the p*” column of A’. We prove the first part of 
the lemma by showing that A’ ep = (S _D’'S~1) ep € R**! for all p € {1,... k}. 
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Let T denote D’ S~' ep. Then T is a column vector with T[r] = 
D'\r,r| S~"[r,p] for all r € {1,...k}. Let U denote ST. By definition, U is 
the p” column of the matrix A’. To compute U, recall that the rows of S71 
form a basis of C*. Therefore, for every q € {1,...k}, ST} eq can be viewed as 
transforming the basis of the unit vector eg to that given by the rows of S~! 
(modulo possible scaling by real scalars denoting the lengths of the row vectors of 
S~+). Similarly, computation of U = ST can be viewed as applying the inverse 
basis transformation to T. It follows that the components of U can be obtained 
by computing the dot product of T and the transformed unit vector S~! eg, for 
each q € {1,...k}. In other words, Ufq] = T- (97! eq). We show below that 
each such Ufq] is real. 

By definition, Ufq] = *_.(T[r] Scr.) = Zt (D'[r,r] 97r, p] 97! 
[r,q]). We consider two cases below. 


— If Dẹ[i,i] = a is real, recalling the definition of D’, the expression for Ufq] 
simplifies to 5# (Dlr, r] Stir, p] S*[r,q]) + (ei -— 1) a Si, p) STi, q]. 
Note that 0*_,(D[r,r] S~![r, p] S~"[r, q]) is the qt” component of the vector 
(SDS!) ep = A ep. Since A is real, so must be the qt” component of A ep. 
Moreover, since a is real, by our choice of S71, both S~*[i,p] and S7 }[i, q] 
are real. Since c€; is also real, it follows that (e; — 1) a S~1[i,p] S~*[i,q] is 
real. Hence U|q] is real for all q € {1,...k}. 

- If D[i,i] = a is not real, from Definition 3, we know that D’[i,7] = 
c; a and D'[hp(t),hp(i)] = c; @ The expression for Ufq] then simpli- 
fies to J“ 1 (Dir, r] Stir, p] S72 [r qd) 4 (ee 1) (8 + 7), where 8 = 
a Sti, p] S7 Hi, q] and a a sT k hp(i), p| S “fap, q]. By our choice 
of S~!, we know that S~'[hp(i),p] = S—1[i, p] and S~'[hp(i), q] = S$ [i, q]. 
Therefore, 6 = 7 and hence (e; — 1) (6 + y) is real. By a similar argument as 
in the previous case, it follows that Ufq] is real for all q € {1,...k}. 


The proof when A € RA*** and € € Q* follows from a similar reasoning as 
above, and from the following facts about real algebraic matrices. 


— If A is a real algebraic matrix, then every eigenvalue of A is either a real or 
complex algebraic number. 

— If A is diagonalizable, then for every real (resp. complex) algebraic eigenvalue 
of A, there exists a set of real (resp. complex) algebraic eigenvectors that form 
a basis of the corresponding eigenspace. 


6 Conclusion 


In this paper, we investigated eventual non-negativity and positivity for matrices 
and the weighted sum of powers of matrices (ENNsom/EPsom). First, we showed 
reductions from and to specific problems on linear recurrences, which allowed us 
give complexity lower and upper bounds. Second, we developed a new and generic 
perturbation-based reduction technique from simple matrices to diagonalizable 
matrices, which allowed us to transfer results between these settings. 
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Most of our results, that we showed in the rational setting, hold even with 
real-algebraic matrices by adapting the complexity notions and depending on 
corresponding results for ultimate positivity for linear recurrences and related 
problems over reals. As future work, we would like to extend our techniques for 
other problems of interest like the existence of a matrix power where all entries 
are non-negative or zero. Finally, the line of work started here could lead to 
effective algorithms and applications in varied areas ranging from control theory 
systems to cyber-physical systems, where eventual properties of matrices play a 
crucial role. 
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Abstract. We consider a logic used to describe sets of configurations of dis- 
tributed systems, whose network topologies can be changed at runtime, by recon- 
figuration programs. The logic uses inductive definitions to describe networks 
with an unbounded number of components and interactions, written using a mul- 
tiplicative conjunction, reminiscent of Bunched Implications [37] and Separation 
Logic [39]. We study the complexity of the satisfiability and entailment prob- 
lems for the configuration logic under consideration. Additionally, we consider 
the robustness property of degree boundedness (is every component involved in a 
bounded number of interactions?), an ingredient for decidability of entailments. 


1 Introduction 


Distributed systems are increasingly used as critical parts of the infrastructure of our 
digital society, as in e.g., datacenters, e-banking and social networking. In order to 
address maintenance (e.g., replacement of faulty and obsolete network nodes by new 
ones) and data traffic issues (e.g., managing the traffic inside a datacenter [35]), the 
distributed systems community has recently put massive effort in designing algorithms 
for reconfigurable systems, whose network topologies change at runtime [23]. How- 
ever, dynamic reconfiguration in the form of software or network upgrades has been 
recognized as one of the most important sources of cloud service outage [25]. 

This paper contributes to a logical framework that addresses the timely problems of 
formal modeling and verification of reconfigurable distributed systems. The basic build- 
ing blocks of this framework are (i) a Hoare-style program proof calculus [1] used to 
write formal proofs of correctness of reconfiguration programs, and (ii) an invariant syn- 
thesis method [6] that proves the safety (i.e., absence of reachable error configurations) 
of the configurations defined by the assertions that annotate a reconfiguration program. 
These methods are combined to prove that an initially correct distributed system cannot 
reach an error state, following the execution of a given reconfiguration sequence. 

The assertions of the proof calculus are written in a logic that defines infinite sets 
of configurations, consisting of components (i.e., processes running on different nodes 
of the network) connected by interactions (i.e., multi-party channels alongside which 
messages between components are transfered). Systems that share the same architec- 
tural style (e.g., pipeline, ring, star, tree, etc.) and differ by the number of components 
and interactions are described using inductively defined predicates. Such configurations 
can be modified either by (a) adding or removing components and interactions (recon- 
figuration), or (b) changing the local states of components, by firing interactions. 
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The assertion logic views components and interactions as resources, that can be 
created or deleted, in the spirit of resource logics à la Bunched Implications [37], or 
Separation Logic [39]. The main advantage of using resource logics is their support for 
local reasoning [12]: reconfiguration actions are specified by pre- and postconditions 
mentioning only the resources involved, while framing out the rest of the configuration. 

The price to pay for this expressive power is the difficulty of automating the rea- 
soning in these logics. This paper makes several contributions in the direction of proof 
automation, by studying the complexity of the satisfiability and entailment problems, 
for the configuration logic under consideration. Additionally, we study the complexity 
of a robustness property [27], namely degree boundedness (is every component involved 
in a bounded number of interactions?). In particular, the latter problem is used as a 
prerequisite for defining a fragment with a decidable entailment problem. For space 
reasons, the proofs of the technical results are given in [5]. 


1.1 Motivating Example 


The logic studied in this paper is motivated by the need for an assertion language 
that supports reasoning about dynamic reconfigurations in a distributed system. For 
instance, consider a distributed system consisting of a finite (but unknown) number of 
components (processes) placed in a ring, executing the same finite-state program and 
communicating via interactions that connect the out port of a component to the in port 
of its right neighbour, in a round-robin fashion, as in Fig. 1(a). The behavior of a com- 
ponent is a machine with two states, T and H, denoting whether the component has a 
token (T) or not (H). A component c; without a token may receive one, by executing a 


transition H “> T, simultaneously with its left neighbour c j, that executes the transition 


T £5 H. Then, we say that the interaction (cj, out,c;, in) has fired, moving a token one 


position to the right in the ring. Note that there can be more than one token, moving 
independently in the system, as long as no token overtakes another token. 
The token ring system is formally specified by the following inductive rules: 


ringa s(x) — dyaz . [x] @q* (x.out, z.in) * chainy »(z,y) * (y.out, x.in) 
chaing +(x, y) — 3z. [x] @q * (x.out, z.in) * chainy »(z,y) 
chaing,1 (x, x) — [x]@T chain: o(x, x) — [x]@H chaino,o(x, x) — [x] 
def | max(h— 1,0) , if q = H def | max(t— 1,0) ,ifg=T 
where i Ln eget andr = i ifq=H 


The predicate ring; ,(x) describes a ring with at least two components, such that at least 
h (resp. t) components are in state H (resp. T). The ring consists of a component x in 
state q, described by the formula [x]@g, an interaction from the out port of x to the 
in port of another component z, described as (x.out, z.in), a separate chain of compo- 
nents stretching from z to y (chain, ,(z, y)), and an interaction connecting the out port 
of component y to the in port of component x ((y.out, x.in)). Inductively, a chain con- 
sists of a component [x] @q, an interaction (x.out, z.in) and a separate chainy ,/(z,y). 
Figure 1(b) depicts the unfolding of the inductive definition of the token ring, with the 
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{ringa ;(y)} /*assume h>2, t>1+/ 
P eng ey O) ¢—* è, f [y] @H x (y.out, z.in) * chainp—1 (z, x) l 


in in in in > 
x (x.out,y.in) 
out out out out 2 r 
disconnect (x.out, y.in); 


a ELS COCE) 


disconnect (y.out, z.in); 


(pet +(e) 
SE Reed ae nia Weleve. (yi); 

! {| chaing_1,(z,x) |} 

connect (x.out, z.in) 
{chainp—1 (z, x) * (x.out, z.in) } 
{ring;,—14(z)} 


chainy 17 (hy) i 


(xout,z! in) | (z! out, 2 in) 


@ eH @ 


(c) 


Fig. 1. Inductive Specification and Reconfiguration of a Token Ring 


existentially quantified variables z from the above rules o.-renamed to z!,z*,... to avoid 
confusion. 

A reconfiguration program takes as input a mapping of program variables to com- 
ponents and executes a sequence of basic operations i.e., component/interaction cre- 
ation/deletion, involving the components and interactions denoted by these variables. 
For instance, the reconfiguration program in Fig. 1(c) takes as input three adjacent com- 
ponents, mapped to the variables x, y and z, respectively, removes the component y 
together with its left and right interactions and reconnects x directly with z. Program- 
ming reconfigurations is error-prone, because the interleaving between reconfiguration 
actions and interactions in a distributed system may lead to bugs that are hard to trace. 
For instance, if a reconfiguration program removes the last component in state T (resp. 
H) from the system, no token transfer interaction may fire and the system deadlocks. 

We prove absence of such errors using a Hoare-style proof system [1], based on 
the logic introduced above as assertion language. For instance, the proof from Fig. 
1(c) shows that the reconfiguration sequence applied to a component y in state H (i.e., 
[y]@H) in a ring with at least h > 2 components in state H and at least t > 1 components 
in state T leads to a ring with at least h — 1 components in state H and at least ¢ in 
state T; note that the states of the components may change during the execution of the 
reconfiguration program, as tokens are moved by interactions. 

The proof in Fig. l(c) uses local axioms specifying, for each basic operation, 
only those components and interactions required to avoid faulting, with a frame rule 
{o} P {w} => {o «(F }} P {wx F}; for readability, the frame formule (from the pre- 
conditions of the conclusion of the frame rule applications) are enclosed in boxes. 

The proof also uses the consequence rule {0} P {w} = {6'} P {w’} that applies if 
6’ is stronger than ọ and y’ is weaker than yw. The side conditions of the consequence 
rule require checking the validity of the entailments ring, ,(y) FE 3x3z . (x.out, y.in) * 
[y] @H x (y.out, z.in) * chainy_14(z,x) and chainy—1;(z, x) * (x.out, z.in)  ring,_1,(z), 
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for all h > 2 andt > 1. These side conditions can be automatically discharged using the 
results on the decidability of entailments given in this paper. Additionally, checking the 
satisfiability of a precondition is used to detect trivially valid Hoare triples. 


1.2 Related Work 


Formal modeling coordinating architectures of component-based systems has received 
lots of attention, with the development of architecture description languages (ADL), 
such as BIP [3] or REO [2]. Many such ADLs have extensions that describe pro- 
grammed reconfiguration, e.g., [19,30], classified according to the underlying formal- 
ism used to define their operational semantics: process algebras [13,33], graph rewrit- 
ing [32,41,44], chemical reactions [43] (see the surveys [7,11]). Unfortunately, only 
few ADLs support formal verification, mainly in the flavour of runtime verification 
[10, 17,20,31] or finite-state model checking [14]. 

Parameterized verification of unbounded networks of distributed processes uses 
mostly hard-coded coordinating architectures (see [4] for a survey). A first attempt at 
specifying architectures by logic is the interaction logic of Konnov et al. [29], a combi- 
nation of Presburger arithmetic with monadic uninterpreted function symbols, that can 
describe cliques, stars and rings. More structured architectures (pipelines and trees) can 
be described using a second-order extension [34]. However, these interaction logics are 
undecidable and lack support for automated reasoning. 

Specifying parameterized component-based systems by inductive definitions is not 
new. Network grammars [26,32,40] use context-free grammar rules to describe sys- 
tems with linear (pipeline, token-ring) architectures obtained by composition of an 
unbounded number of processes. In contrast, we use predicates of unrestricted arities 
to describe architectural styles that are, in general, more complex than trees. Moreover, 
we write inductive definitions using a resource logic, suitable also for writing Hoare 
logic proofs of reconfiguration programs, based on local reasoning [12]. 

Local reasoning about concurrent programs has been traditionally the focus of Con- 
current Separation Logic (CSL), based on a parallel composition rule [36], initially 
with a non-interfering (race-free) semantics [8] and later combining ideas of assume- 
and rely-guarantee [28,38] with local reasoning [22,42] and abstract notions of fram- 
ing [15, 16,21]. However, the body of work on CSL deals almost entirely with shared- 
memory multithreading programs, instead of distributed systems, which is the aim of 
our work. In contrast, we develop a resource logic in which the processes do not just 
share and own resources, but become mutable resources themselves. 

The techniques developed in this paper are inspired by existing techniques for sim- 
ilar problems in the context of Separation Logic (SL) [39]. For instance, we use an 
abstract domain similar to the one defined by Brotherston et al. [9] for checking satis- 
fiability of symbolic heaps in SL and reduce a fragment of the entailment problem in 
our logic to SL entailment [18]. In particular, the use of existing automated reasoning 
techniques for SL has pointed out several differences between the expressiveness of our 
logic and that of SL. First, the configuration logic describes hypergraph structures, in 
which edges are f-tuples for £ > 2, instead of directed graphs as in SL, where £ is a 
parameter of the problem: considering £ to be a constant strictly decreases the com- 
plexity of the problem. Second, the degree (number of hyperedges containing a given 
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vertex) is unbounded, unlike in SL, where the degree of heaps is constant. Therefore, 
we dedicate an entire section (Sect. 4) to the problem of deciding the existence of a 
bound (and computing a cut-off) on the degree of the models of a formula, used as a 
prerequisite for the encoding of the entailment problems from the configuration logic 
as SL entailments. 


2 Definitions 


We denote by N the set of positive integers including zero. For a set A, we define A! =A, 


Ait! & Ai x A, for all i > 1, and A* = U»14', where x denotes the Cartesian product. 
We denote by pow(A) the powerset of A and by mpow(A) the power-multiset (set of 
multisets) of A. The cardinality of a finite set A is denoted as ||A||. By writing A Cyn B 
we mean that A is a finite subset of B. Given integers i and j, we write |i, j] for the 


set {i,i+1,..., j}, assumed to be empty if i > j. For a tuple t = (t,...,t,), we define 


it] Sn, (t); = t and (Oia = (ti,...,t;). By writing x = poly(y), for given x,y € N, we 


mean that there exists a polynomial function f : N — N, such that x < f(y). 


2.1 Configurations 


We model distributed systems as hypergraphs, whose vertices are components (i.e., the 
nodes of the network) and hyperedges are interactions (i.e., describing the way the 
components communicate with each other). The components are taken from a countably 
infinite set C, called the universe. We consider that each component executes its own 
copy of the same behavior, represented as a finite-state machine B = (P, Q, — ), where 
P is a finite set of ports, Q is a finite set of states and —-C Q x P x Q is a transition 


relation. Intuitively, each transition q 2, q' of the behavior is triggerred by a visible 
event, represented by the port p. For instance, the behavior of the components of the 
token ring system from Fig. l(a) is B = ({in, out}, {H, T}, {H 5 T,T 25 H}). The 
universe C and the behavior B = (P, Q, —) are fixed in the rest of this paper. 

We introduce a logic for describing infinite sets of configurations of distributed 
systems with unboundedly many components and interactions. A configuration is a 
snapshot of the system, describing the topology of the network (i.e., the set of present 
components and interactions) together with the local state of each component: 


Definition 1. A configuration is a tuple y = (C, 1,p), where: 


— C Cfn C is a finite set of components, that are present in the configuration, 

— I Chin (C x P)” is a finite set of interactions, where each interaction is a sequence 
(C1, P1,- --,Cn, Pn) E (C x PY” that binds together the ports p\,..., Pn of the pairwise 
distinct components c1,...,Cn, respectively. 

— p:C— Q is a state map associating each (possibly absent) component, a state of 
the behavior B, such that the set {c € C | p(c) = q} is infinite, for each q € Q, 


The last condition requires that there is an infinite pool of components in each state 
q E€ Q; since C is infinite and Q is finite, this condition is feasible. For example, the con- 
figurations of the token ring from Fig. l(a) are ({c1,...,¢n},{(Ci, out, C(; mod n)-+15 4) | 
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i € [1,n]},p), where p : C > {H,T} is a state map. The ring topology is described by 
the set of components {c1,...,¢,} and interactions {(c¢;, out, C(j mod n)+1,4”) |i € [1,n]}- 

Intuitively, an interaction (c1, p1,.--,Cn, Pn) synchronizes transitions labeled by the 
ports p1,...,Pn from the behaviors (i.e., replicas of the state machine B) of c1,...,¢n, 
respectively. Note that the components c; are not necessary part of the configuration. 
The interactions are classified according to their sequence of ports, called the interac- 


tion type and let Inter 1 PH be the set of interaction types; an interaction type models, 
for instance, the passing of a certain kind of message (e.g., request, acknowledgement, 
etc.). From an operational point of view, two interactions that differ by a permutation 
of indices e.g., (C1, P1,---,Cn, Pn) and (Ci, Pi, +++; Ci, Pip) Such that {i),...,in} = [1,7], 
are equivalent, since the set of transitions is the same; nevertheless, we chose to distin- 
guish them in the following, exclusively for reasons of simplicity. 

Below we define the composition of configurations, as the union of disjoint sets of 
components and interactions: 


Definition 2. The composition of two configurations Yi = (C;, I;,p), for i= 1,2, such 
that Ci N Q = 0 and 1AN h = 0, is defined as yı 2 = (AUG, iU b,p). The com- 
position Yı © Y2 is undefined if (NQ@ AVor Ah FO. 


In analogy with graphs, the degree of a configuration is the maximum number of inter- 
actions from the configuration that involve a (possibly absent) component: 


Definition 3. The degree of a configuration y = (C,I,p) is defined as 5(y) £ 
maxcec Õe(y), where Se(¥) = (c1, P1,- -+€n, Pn) € I |c = ci, i€ [1,]}] 


For instance, the configuration of the system from Fig. 1(a) has degree two. 


2.2 Configuration Logic 


Let V and A be countably infinite sets of variables and predicates, respectively. For 
each predicate A € A, we denote its arity by #A. The formule of the Configuration 
Logic (CL) are described inductively by the following syntax: 


b := emp | [x] | (1-p1,---%n-Pn) | x@q|x=y|xAy|A(x1,.--,%#a) | 0*0] dx. 


where x,y,x1,... E€ V, q E Q and A € A. A formula [x], (x1.p7,.--,Xn-Pn), x@q and 
A(x1,... ,X#a ) is called a component, interaction, state and predicate atom, respectively. 
These formule are also referred to as atoms. The connective * is called the separating 
conjunction. We use the shorthand [x] @q © [x] x x@q. For instance, the formula [x] @q * 
[y]@q' * (x.out, y.in) x (x.in, y.out) describes a configuration consisting of two distinct 
components, denoted by the values of x and y, in states q and q’, respectively, and two 
interactions binding the out port of one to the in port of the other component. 

A formula is said to be pure if and only if it is a separating conjunction of state 
atoms, equalities and disequalities. A formula with no occurrences of predicate atoms 
(resp. existential quantifiers) is called predicate-free (resp. quantifier-free). A variable 
is free if it does not occur within the scope of an existential quantifier ; we note fv(@) the 
set of free variables of ọ. A sentence is a formula with no free variables. A substitution 


Decision Problems in a Logic for Reasoning 697 


o[x1/y1.--Xn/Yn] replaces simultaneously every free occurrence of x; by y; in ọ, for all 
i € [1,n]. Before defining the semantics of CL formule, we introduce the set of inductive 
definitions that assigns meaning to predicates: 


Definition 4. A set of inductive definitions (SID) A consists of rules A(x1,...,x#a) — 
Q, where x1,...,X#a are pairwise distinct variables, called parameters, such that fv(o) C 
{x,,...,%#a}. The rule A(x1,...,x#a) — © defines A and we denote by def,(A) the set 
of rules from A that define A. 


Note that having distinct parameters in a rule is without loss of generality, as e.g., a rule 
A(x1,x1) — 6 can be equivalently written as A(x1,x2) — x1 = x2 *. As a convention, 
we shall always use the names x),...,x#q for the parameters of a rule that defines A. 

The semantics of CL formule is defined by a satisfaction relation y =% between 
configurations and formulz. This relation is parameterized by a store v : Y — C map- 
ping the free variables of a formula into components from the universe (possibly absent 
from y) and an SID A. We write v[x — c] for the store that maps x into c and agrees with 
v on all variables other than x. The definition of the satisfaction relation is by induction 
on the structure of formule, where y= (C, I,p) is a configuration (Definition 1): 


y HX emp <= C=0andl=0 

yX k] <— C= {v(x)} and I= 0 

VEX 1-P1,---;Xn-Pn) <> C =0 and I= {(v(x1), P1; -- -V (Xn), Pn)} 

Y EX x@4 <> yY emp and p(v(x)) = q 

y= x~y <=> y HX emp and v(x) ~ v(y), for all ~E {=,#} 

y EA AQ. Ha) > YHX Ober /y1,---,XHa/y#a], for some rule 

A(x1,...; X44) — Ọ from A 

YEX 01 * 02 <=> exist y1, Y2, such that y = Yı è y2 and y; =X $j, for i= 1,2 

Y¥ EX x.o => y Hied , for some c € C 
If ọ is a sentence, the satisfaction relation y |=] does not depend on the store, written 
y |a È, in which case we say that yis a model of ọ. If ọ is a predicate-free formula, the 


satisfaction relation does not i on the SID, written y EY b. A formula 6 is satisfi- 
able if and only if the sentence 3x; .. . 3x, . @ has a model, where fv(o) = {x1,...,%n}. 
A formula ọ entails a formula y, wien no Ea w if and only if, for any configuration y 
and store v, we have y |=% 9 only if y FX Y. 


2.3 Separation Logic 


Separation Logic (SL) [39] will be used in the following to prove several technical 
results concerning the decidability and complexity of certain decision problems for 
CL. For self-containment reasons, we define SL below. The syntax of SL formule is 
described by the following grammar: 


 := emp | xo (x1,...,x9) |x= y| xZ y| AGa,- xa) | O*O| dx. o 


where x, y,x9,x1,--.€ V, A € A and & > 1 is an integer constant. Formule of SL are 
interpreted over finite partial functions h : C —,,, C*, called heaps', by a satisfaction 
relation h IF” Q, defined inductively as follows: 


' We use the universe C here for simplicity, the definition works with any countably infinite set. 
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h IF% emp <=> h=0 
h IF% xo (x1,---,%@) <=> dom(h) = {v(xo)} and h(v(xo)) = (v(x1), -v(x ay 
h IFY 1 * 2 <=> there exist hı, h2 such that dom(h;) ridom(biz) = 


h = hy Uh? and h; IF% 9;, for both i = 1,2 
where dom(h) © {c € C | h(c) is defined} is the domain of the heap and (dis-) equali- 
ties, predicate atoms and existential quantifiers are defined same as for CL. 


2.4 Decision Problems 


We define the decision problems that are the focus of the upcoming sections. As usual, 
a decision problem is a class of yes/no queries that differ only in their input. In our case, 
the input consists of an SID and one or two predicates, written between square brackets. 


Definition 5. We consider the following problems, for a SID A and predicates A,B € A: 


1. Sat[A, Al]: is the sentence Ax, ...Axya . A(x1,..-,x#a) satisfiable for A? 
2. Bnd[A, A): is the set {6(y) | y Ha dx... Ixya - A(x, ,XHa) } finite? 
3. Entl[A, A, B]: does A(x1,...,X#A) A IX4#B+1 +- IXHA - B(x... . XH) hold? 


aa 


The size of a formula ọ is the total number of occurrences of symbols needed to write it 


down, denoted by size(). The size of a SID A is size(A) © DA(x1 xa) oea Size() + 
#A + 1. Other parameters of a SID A are: 


— arity(A) = = max {iA | A(x1,-.-,X#a) — 0 E€ A}, 
— width(A) © max{size() | A(x1,...,x#a) — 0 € A}, 
— intersize(A) E max{n | (x1.P1,---,Xn-Pn) occurs in 0, A(x1,...,x#a) — 0 E A}. 


For a decision problem P[A,A,B], we consider its (k,)-bounded versions 
pth.) [A, A,B], obtained by restricting the predicates and interaction atoms occurring 
A to arity(A) < k and intersize(A) < @, respectively, where k and £ are either positive 
integers or infinity. We consider, for each PJA, A, B], the subproblems P®® [A, A, B] cor- 
responding to the three cases (1) k < œ% and £ = œ, (2) k = œ and £ < œ, and (3) k = œ 
and £ = œ. As we explain next, this is because, for the decision problems considered 
(Definition 5), the complexity for the case k < œ,£ < œ matches the one for the case 
k < œ, l = 09, 

Satisfiability (1) and entailment (3) arise naturally during verification of reconfigu- 
ration programs. For instance, Sat[A,] asks whether a specification ọ of a set configu- 
rations (e.g., a pre-, post-condition, or a loop invariant) is empty or not (e.g., an empty 
precondition typically denotes a vacuous verification condition), whereas Entl[A, 0, y] 
is used as a side condition for the Hoare rule of consequence, as in e.g., the proof 
from Fig. l(c). Moreover, entailments must be proved when checking inductiveness of 
a user-provided loop invariant. 

The Bnd/A, 0] problem is used to check a necessary condition for the decidability 
of entailments i.e., Entl[A,o, y]. If Bnd/A, 6] has a positive answer, we can reduce the 
problem Entl[A,, y] to an entailment problem for SL, which is always interpreted over 
heaps of bounded degree [18]. Otherwise, the decidability status of the entailment prob- 
lem is open, for configurations of unbounded degree, such as the one described by the 
example below. 
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Example I. The following SID describes star topologies with a central controller con- 
nected to an unbounded number of workers stations: 


Controller(x) —[x] x Worker(x) 
Worker(x) —3 


ly . (x.out, y.in) * [y] x Worker(x) Worker(x)— emp E 


3 Satisfiability 


We show that the satisfiability problem (Definition 5, point 1) is decidable, using a 
method similar to the one pioneered by Brotherston et al. [9], for checking satisfiability 
of inductively defined symbolic heaps in SL. We recall that a formula 7 is pure if and 
only if it is a separating conjunction of equalities, disequalities and state atoms. In the 
following, the order of terms in (dis-)equalities is not important i.e., we consider x = y 
(resp. x Æ y) and y = x (resp. y Æ x) to be the same formula. 


Definition 6. The closure cl(t) of a pure formula T is the limit of the sequence 
n,n! 27,... such that T? = T and, for each i > 0, nit! is obtained by joining (with 
+) all of the following formule to Ti: 


— x=z, where x and z are the same variable, or x = y and y = z both occur in Ti, 
— x#z, where x = y and y £ z both occur in T', or 
— y@q, where x@q and x = y both occur in T’. 


Because only finitely many such formule can be added, the sequence of pure formule 
from Definition 6 is bound to stabilize after polynomially many steps. A pure formula 
is satisfiable if and only if its closure does not contain contradictory literals i.e., x = y 
and x Æ y, or x@q and x@q', forg#q' € Q, We write x ~qr y (resp. x %,y) if and only 
if x = y (resp. x Æ y) occurs in cl(z) and not(x = y) (resp. not(x #,y)) whenever x ~r y 
(resp. x,y) does not hold. Note that e.g., not(x %x y) is not the same as x %,y. 

Base tuples constitute the abstract domain used by the algorithms for checking sat- 
isfiability (point 1 of Definition 5) and boundedness (point 2 of Definition 5), defined 
as follows: 


Definition 7. A base tuple is a triple t = (CË, I*, x), where: 


— CË € mpowV is a multiset of variables denoting present components, 

— I: Inter + mpowV* maps each interaction type T € Inter into a multiset of tuples 
of variables of length |t| each, and 

— Tis a pure formula. 


A base tuple is called satisfiable if and only if n is satisfiable and the following hold: 


1. for all x,y € CË, not(x Xr y), 

2. for all t € Inter, (x1,-..,Xjx})(V15+--sYje|) € TË(T), there exists i € [1,|t|] such that 
not(x; <n yi), 

3. for allt € Inter, (x1,...,X#7) E€ I#(t) andl <i<j<|t 


, we have not(x; x xj). 


We denote by SatBase the set of satisfiable base tuples. 
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Intuitively, a base tuple is an abstract representation of a configuration, where compo- 
nents (resp. interactions) are represented by variables (resp. tuples of variables). Note 
that a base tuple (ct, I 8 T) is unsatisfiable if C? (I #) contains the same variable (tuple 
of variables) twice (for the same interaction type), hence the use of multisets in the 
definition of base tuples. It is easy to see that checking the satisfiability of a given base 
tuple (CË, I*,) can be done in time poly(||C*|] + Xcetnter ll 22(t) || + size(z)). 
We define a partial composition operation on satisfiable base tuples, as follows: 
(Ch ttm) @ (Ch, Bm) © (ChUCG, HU Bm +m) 
where the union of multisets is lifted to functions Inter + mpow(Y~) in the usual way. 
The composition operation ® is undefined if ( Cc I ,T1) @( æ ; Ii m2) is not satisfiable 
e.g., if c! N Œ #0, F(t) N L(t) #0, for some T € Inter, or T1 * 12 is not satisfiable. 
Given a pure formula 7 and a set of variables X, the projection 1|, removes from m 


all atoms æ, such that fv(a) Z X. The projection of a base tuple (C*, 7,2) on a variable 
set X is formally defined below: 


def 


(Œ, I*,n) |x a 
where dist(I*) © >k tenter X (x, 


(COX, At. {arses -t) © C) lxi € X} el (dist( 4) +72) ) 


stj € ZË (1) x 1<i<j<|t| Xi FX; 

The substitution operation (C*,I’,1)[x1/y1,-..,%n/Vn] replaces simultaneously 
each x; with y; in C*, IË and n, respectively. We lift the composition, projection and 
substitution operations to sets of satisfiable base tuples, as usual. 

Next, we define the base tuple corresponding to a quantifier- and predicate-free 
formula ọ = y * n, where yw consists of component and interaction atoms and 7 is pure. 
Since, moreover, we are interested in those components and interactions that are visible 
through a given indexed set of parameters X = {x1,. . . ,Xn }, for a variable y, we denote 
by {{y}}< the parameter x; with the least index, such that y +, x;, or y itself, if no such 
parameter exists. We define the following sets of formule: 


fyi i t r a) i i 
Base(,X) 2 {(C#, L?,m)} , if (C zi T) is satisfiable 
0 , otherwise 

where C? = {HH | [x] occurs in y} 


I E Ay... p) {UDE o s | OLP- Ys:Ps) occurs in y} 


We consider a tuple of variables X; having a variable X(A) ranging over 
pow(SatBase), for each predicate A that occurs in A. With these definitions, each rule 
of A: 


A(x1,---;X#a) — 3y1 - -< din -$x Bi(Z],---5Zap,) * + * BaCi ZB) 


where 6 is a quantifier- and predicate-free formula, induces the constraint: 


h 


X(A) 2 (Base(6, {x1,---,%#a}) 8 &)X (Be) [x1 /zi, ae .x#8,/246/]) Loy aa D 
f=1 
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input: a SID A output: uA 
1: initially uX .Af := AA . 0 
2: for A(x1,.--,%#a) © dy1... dm - Q € A, with ọ quantifier- and predicate-free do 
> => 
3: ux AA) := uX AŤ (A) UBase(o, {x1,... XHA hal tea, 
4: while u X.A" still change do 
3: for r: A(xi,...,X%#a) — Sy. IYm OK Belz} oley) € Ado 
6: 
T: 


if there exist tı € uX A! (B1), .-., tn € 4X AË (Bp) then 
Fig. 2. Algorithm for the Computation of the Least Solution 


Let AË be the set of such constraints, corresponding to the rules in A and let ux At 
be the tuple of least solutions of the constraint system generated from A, indexed by 
the tuple of predicates that occur in A, such that ux.At(A) denotes the entry of px At 
correponding to A. Since the composition and projection are monotonic operations, 
such a least solution exists and is unique. Since SatBase is finite, the least solution can 
be attained in a finite number of steps, using a Kleene iteration (see Fig. 2). 

We state below the main result leading to an elementary recursive algorithm for the 
satisfiability problem (Theorem 1). The intuition is that, if ux .At(A) is not empty, then 
it contains only satisfiable base tuples, from which a model of A(x1,...,x#q) can be 
built. 


Lemma 1. Sat[A, A] has a positive answer if and only if uX .AË(A) +0. 


If the maximal arity of the predicates occurring in A is bound by a constant k, no 
satisfiable base tuple (C?, 7,7) can have a tuple Ois) E T*(t), for some T € 
Inter, such that |t| > k, since all variables y1,... ‚Yj; are parameters denoting distinct 
components (point 3 of Definition 7). Hence, the upper bound on the size of a satisfiable 
base tuple is constant, in both the k < 0, < œ and k < œ, l = œ cases, which are, 
moreover indistinguishable complexity-wise (i.e., both are NP-complete). In contrast, 
in the cases k = o,f < œ and k = œ, £ = œ, the upper bound on the size of satisfiable 
base tuples is polynomial and simply exponential in size(A), incurring a complexity gap 
of one and two exponentials, respectively. The theorem below states the main result of 
this section: 


Theorem 1. Sat*® [A,A] is NP-complete for k > 4, Sat®®JA,A] is EXP-complete 
and Sat|A, A] is in 2EXP. 


The upper bounds are consequences of the fact that the size of a satisfiable base tuple is 
bounded by a simple exponential in the min(arity(A), intersize(A)), hence the number 
of such tuples is doubly exponential in min(arity(A),intersize(A)). The lower bounds 
are by a polynomial reduction from the satisfiability problem for SL [9]. 


Example 2. The doubly-exponential upper bound for the algorithm computing the least 
solution of a system of constraints of the form (1) is necessary, in general, as illustrated 
by the following worst-case example. Let n be a fixed parameter and consider the n-arity 
predicates A,,...,A, defined by the following SID: 
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AlAs Xn) — = Aia a, t42cRpais peat) for all i € [w= 1] 
Anh X15- Xn) S (X1-P, -Xn P) An(X1,---,Xn) — emp 


where, for a list of variables x;,...,x, and an integer j > 0, we write [x;,... Xn]! for 
the list rotated to the left j times (e.g., [x1,2,%3,%4,X5]7 = x3,X4,5,X1,X2). In this 
example, when starting with A,(x,,...,%,) one eventually obtains predicate atoms 
An(Xj,,---,%;,), for any permutation x;,,...,x;, Of x1,...,%,. Since A, may choose to 
create or not an interaction with that permutation of variables, the total number of base 
tuples generated for A; is 2”. That is, the fixpoint iteration generates 2208") base 
tuples, whereas the size of the input of Sat[A, A] is poly(n). a 


4 Degree Boundedness 


The boundedness problem (Definition 5, point 2) asks for the existence of a bound on 
the degree (Definition 3) of the models of a sentence 4x1... Ix#a . A(x1,...,X#A). Intu- 
itively, the Bnd[A, A] problem has a negative answer if and only if there are increasingly 
large unfoldings (i.e., expansions of a formula by replacement of a predicate atom with 
one of its definitions) of A(x1,. . . ,X#a ) repeating a rule that contains an interaction atom 
involving a parameter of the rule, which is always bound to the same component. We 
formalize the notion of unfolding below: 


Definition 8. Given a predicate A and a sequence (r1,i1),...,(fn,in) € (AX N)F, 


(r1,i1)---(tnyin 


where rı : A(x1,.--,X#a) — 6 E A, the unfolding A(x1,...,x#a) AW is 
inductively defined as (1) Yy = 0 if n= 1, and (2) y is obtained from by 
replacing its i,-th predicate atom B(y,...,y#p) with Wilxi/y1,---,x4e/yap], where 


(r2,i2)...(tnyin 


A Wi is an unfolding, ifn > 1. 


We show that the Bnd[A, A] problem can be reduced to the existence of increasingly 
large unfoldings or, equivalently, a cycle in a finite directed graph, built by a variant of 
the least fixpoint iteration algorithm used to solve the satisfiability problem (Fig. 3). 


Definition 9. Given satisfiable base pairs t,u € SatBase and a rule from A: 


r: A(x1,.--,X#a) — Fy... Aym -0*Bi(z],--- 5245) *---* Balzt, ZB) 


where is a quantifier- and predicate-free formula, we write (A, t) ~= (B,u) if and 
only if B = B; and there exist satisfiable base tuples t1,...,u = ti, .. . , tn € SatBase, such 
that t € (Base(ọ, {x1,...,x#a}) 29 Q telxi/zi, ws seh | Css |) barra: We define the 
directed graph with edges labeled by pairs (r,i) € Ax N: 


G(A A)# ({def(A) x SatBase}, {((A, t), (r,i), (B, u)) | (A, fe (B, u)}) 


The graph G(A) is built by the algorithm in Fig. 3, a slight variation of the classical 
Kleene iteration algorithm tor Pa ae of the least solution of the constraints of 
the form (1). A path (A1, t1) “> (Ag, t) “> ... “> (An, ty) in G(A) induces a unique 


Decision Problems in a Logic for Reasoning 703 


input: a SID A output: G(A) = (V,E) 
ie initially V := 0, E := 0 
2: for A(x1,... xa) = Sy. Sym. $ € A, with ọ ne aad and predicate-free do 
3: V:=VU ({A} x Base(@, {x1,..., XA } bey 


4: while V or E still change do 
5: for r: AQ... xa) — Syn. ym -QK fy Beli ZB) € A do 
6: if there exist (B1, t1),...,(Bn, tn) € V then 
qe P (Base(9,{n1,.. . Xun }) @ Qi 1 teka zf,- 18; fs, day HA 
8: V:=VU({A} xX) 
9: E := EU{((A, t), (r,£), (Be, t)) | te X,£€ [1,h]} 
Fig. 3. Algorithm for the Construction of G(A) 
unfolding Aj (x1,...,x#a,) as A Q (Definition 8). Since the vertices of G(A) 


are pairs (A,t), where t is a satisfiable base tuple and the edges of G(A) reflect the 
construction of the base tuples from the least solution of the constraints (1), the outcome 
of this unfolding is always a satisfiable formula. 

An elementary cycle of G(A) is a path from some vertex (B,u) back to itself, such 
that (B,u) does not occur on the path, except at its endpoints. The cycle is, moreover, 
reachable from (A, t) if and only if there exists a path (A, t) “> ... ““> (B, u) in G(A). 
We reduce the complement of the Bnd[A, A] problem, namely the existence of an infinite 
set of models of 4x1... Ixa . A(x1,.-..,x#a) of unbounded degree, to the existence of a 
reachable elementary ydi in G(A’), wiete A’ is obtained from A, as described in the 
following. 

First, we consider, for each predicate B € def(A), a predicate B’, of arity #B + 1, 
not in def(A) i.e., the set of ae for which there exists a R in A. Second, for 
each rule Bo(x1,...,%#B)) — 3y1 ---IYm . Ọ* x$ 5 Balti js -ZB ) € A, where ọ is a 
quantifier- and predicate-free formula and iv(o) c TO denotes the subset of variables 
occurring in interaction atoms in , the SID A’ has the following rules: 


Bo(X1, +++: X#B9X#By +1) — Ay... Sym O* K geiv(g)X#B +1 A §* 


h 17h T4 
x (=2Bo(Z1,--- 5 Z#Bp X#Bo+1) (2) 
Boxi,- - -,X#B0X#B0+1) — Ayr... IYm - Q *xHB 41 = 5* 
h 17h T4 
X (=2Be(Z1,--- 5 Z#Bp X#Bo+1) (3) 


for each variable € € iv(ọ), that occurs in an interaction atom in 9. 


There exists a family of models (with respect to A) of 4x, ...dxya . A(x1,...,X#A) of 
unbounded degree if and only if these are models of Ax, ...Axgay1 - A'(x1,.--,X#A41) 
(with respect to A’) and the last parameter of each predicate B' € def(A’) can be mapped, 
in each of the these models, to a component that occurs in unboundedly many interac- 
tions. The latter condition is equivalent to the existence of an elementary cycle, con- 
taining a rule of the form (3), that it, moreover, reachable from some vertex (A’,t) of 
G(A’), for some t € SatBase. This reduction is formalized below: 
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Lemma 2. There exists an infinite sequence of configurations ¥,Y2,... such that Yi Fa 
xy... dxya . A(x1,..., xa) and 8(y;) < 8(y41), for all i > 1 if and only if G(A’) has an 
elementary cycle containing a rule (3), reachable from a node (A't), for t € SatBase. 


The complexity result below uses a similar argument on the maximal size of (hence 
the number of) base tuples as in Theorem 1, leading to similar complexity gaps: 


Theorem 2. Bnd‘) [A,A] is in co-NP, Bnd" JA, A] is in EXP, Bnd[A, A] is in 2EXP. 
Moreover, the construction of G(A’) allows to prove the following cut-off result: 


Proposition 1. Let y be a configuration and v be a store, such that y =} A(x1,..-,X#A). 
Ga [A, A] then (1) 8(y) = poly(size(A)) if k < 09, £ = 00, (2) &(y) = 2Peb(size(A)) if 
k = 00, £ < œ and (3) 8(y) = 27" fk = æ, C= o, 


5 Entailment 


This section is concerned with the entailment problem Entl[A, A, B], that asks whether 
Y EX Stasi --- dap - B(x1,...,X#B), for every configuration y and store v, such that 
YE x A(11,---,X#a). For instance, the proof from Fig. 1(c) relies on the following entail- 


ments, that occur as the side conditions of the Hoare logic rule of consequence: 


ringa) Fa dxdz.[y] @H « (y.out, z.in) x chainn_1 (z, x) * (x.out, y.in) 
[z] @H x (z.out, x.in) * chainp—1 (x,y) * (y-out, z.in) Fa ringn ;(z) 


By introducing two fresh predicates A; and Ag, defined by the rules: 


Ai (x1) — Aya. [x1] @Hx (x; .out, z.in) * chainp—1 (z, y)* (y-out, x in) (4) 


A2(x1,x2) — 3z.[x1]@H * (x1 .out, z.in) * chainy—17(z,x2) * (x2.0ut, x1 .in) (5) 


the above entailments are equivalent to Entl[A,ring,,,A1] and Entl[A, A2, ring, z], 
respectively, where A consists of the rules (4) and (5), together with the rules that define 
the ring, , and chain, predicates (Sect. 1.1). 

We show that the entailment problem is undecidable, in general (Thm. 3), and 
recover a decidable fragment, by means of three syntactic conditions, typically met 
in our examples. These conditions use the following notion of profile: 


Definition 10. The profile of a SID A is the pointwise greatest function hy : A > 
pow(N), mapping each predicate A into a subset of |1,#A], such that, for each rule 
A(x1,---,%#a) — 0 from A, each atom B(y1,..., yap) from ọ and each i € M(B), there 
exists j E€ A,(A), such that x; and y; are the same variable. 


The profile identifies the parameters of a predicate that are always replaced by a vari- 
able x1,...,x#a in each unfolding of A(x1,...,x#a), according to the rules in A; it is 
computed by a greatest fixpoint iteration, in time poly(size(A)). 


Definition 11. A rule A(x),...,x#4) — dy1...dym.0* > Ë ‘= Be(zf,.. - Zp) where ọ 
is a quantifier- and predicate fee formula, is said to be: 
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1. progressing if and only if 6 = [x1] * y, where w consists of interaction atoms 
involving xı and (dis-)equalities, such that Uf- {zí ozke} = {xo,...,xHa} U 
Dis- Ym} l 

2. connected if and only if, for each £ € [|1,h] there exists an interaction atom in W that 
contains both z and a variable from {x1}U {xi | i € As(A)}, 

3. equationally-restricted (e-restricted) if and only if, for every disequation x + y from 
Q, we have {x,y} N {xi | i € Ay (A)} ZO. 


A SID A is progressing, connected and e-restricted if and only if each rule in A is 
progressing, connected and e-restricted, respectively. 


For example, the SID consisting of the rules from Sect. 1.1, together with rules (4) and 
(5) is progressing, connected and e-restricted. 

We recall that def, (A) is the set of rules from A that define A and denote by def, (A) 
the least superset of defa (A) containing the rules that define a predicate from a rule in 
def, (A). The following result shows that the entailment problem becomes undecidable 
as soon as the connectivity condition is even slightly lifted: 


Theorem 3. Entl[A,A, B] is undecidable, even when A is progressing and e-restricted, 
and only the rules in def,(A) are connected (the rules in def,(B) may be disconnected). 


On the positive side, we prove that Entl[A,A,B] is decidable, if A is progressing, 
connected and e-restricted, assuming further that Bnd[A, A] has a positive answer. In this 
case, the bound on the degree of the models of A(x1,...,x#) is effectively computable, 
using the algorithm from Fig. 3 (see Proposition 1 for a cut-off result) and denote by % 
this bound, throughout this section. 

The proof uses a reduction of Entl[A, A,B] to a similar problem for SL, showed to 
be decidable [18]. We recall the definition of SL, interpreted over heaps h : C —4, C$, 
introduced in Sect. 2.3. SL rules are denoted as A(x1,... XAA) — H, where ọ is a SL 


formula, such that fv() C {21,...,x4q)} and SL SIDs are denoted as A. The profile Ax 
is defined for SL same as for CL (Definition 10). 


Definition 12. A SL rule A(x,... »X4(a)) — Q from a SID A is said to be: 


1. progressing if and only if ọ = St)... Itm . x1 > (1,---;¥a@) * Y, where W contains 
only predicate and equality atoms, _ 
2. connected if and only if z, € {x; | i E Ag(A)}U{1,.--, ya}, for every predicate atom 


B(z1,--+,Z4 gy) from 9. 


Note that the definitions of progressing and connected rules are different for SL, com- 
pared to CL (Definition 11); in the rest of this section, we rely on the context to distin- 
guish progressing (connected) SL rules from progressing (connected) CL rules. More- 
over, e-restricted rules are defined in the same way for CL and SL (point 3 of Definition 
11). A tight upper bound on the complexity of the entailment problem between SL for- 
mulz, interpreted by progressing, connected and e-restricted SIDs, is given below: 


(Aogas for progress- 


Theorem 4 ([18]). The SL entailment problem is in a 
ing, connected and e-restricted SIDs. 
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The reduction of Entl[A, A, B] to SL entailments is based on the idea of viewing a config- 
uration as a logical structure (hypergraph), represented by a undirected Gaifman graph, 
in which every tuple from a relation (hyperedge) becomes a clique [24]. In a similar 
vein, we encode a configuration, of degree at most $, by a heap of degree & (Definition 
13), such that & is defined using the following integer function: 


j-l 
pos(i, j,k) =14+8- > lee +i-|t| +k 
l=1 


where Inter = {t1,..., Tw} is the set of interaction types and Qe {qi,---,qn} is the set 
of states of the behavior B = (P, Q, —) (Sect. 2). Here i € [0,8 — 1] denotes an interac- 
tion of type j € [1,M] and k € [0, N — 1] denotes a state. We use M and N throughout the 
rest of this section, to denote the number of interaction types and states, respectively. 


For a set I of interactions, let Tuples! (c)  {(e1,...,€n) | (C1, P1,---,Cn, Pn) € 
I, tj = (pi,---,Pn), ¢ € {c1,---,Cn}} be the tuples of components from an interac- 
tion of type T; from 7, that contain a given component c. 


Definition 13. Given a configuration y= (C, I,p), such that 5(y) < B, a Gaifman heap 
for y is a heap h : C —;, CÊ, where RČ pos(0,M+1,N), dom(h) = nodes(y) and, for 
all co E€ dom(h), such that h(co) = (c1,..., cg), the following hold: 


1. cı = co if and only if co € C, 

2. for all j € {1,M], Tuples} (c) = {e1,...,¢s} if and only if there exist integers 0 < 
kı <... < ks < B, such that (h(co)) inter(k;,) = €i for all i € [1,5], where inter(i, j) g 
[pos(i— 1, j,0),pos(i, j,0)] are the entries of the i-th interaction of type T; in h(co), 

3. for all k € [1,N], we have (h(co)}state(k) = co if and only if p(co) = qx, where the 


entry state(k) = pos(0,M + 1,k—1) in h(co) corresponds to the state qx € Q, 
We denote by G(y) the set of Gaifman heaps for y. 


Intuitively, if h is a Gaifman heap for y and co € dom(h), then the first entry of h(co) 
indicates whether cg is present (condition 1 of Definition 13), the next B- 5%] |t jl 
entries are used to encode the interactions of each type T; (condition 2 of Definition 13), 
whereas the last N entries are used to represent the state of the component (condition 
3 of Definition 13). Note that the encoding of configurations by Gaifman heaps is not 
unique: two Gaifman heaps for the same configuration may differ in the order of the 
tuples from the encoding of an interaction type and the choice of the unconstrained 
entries from h(co), for each co € dom(h). On the other hand, if two configurations have 
the same Gaifman heap encoding, they must be the same configuration. 


Example 3. Figure 4(b) shows a Gaifman heap for the configuration in Fig. 4(a), where 
each component belongs to at most 2 interactions of type (out, in). a 


We build a SL SID A that generates the Gaifman heaps of the models of the predicate 
atoms occurring in a progressing CL SID A. The construction associates to each variable 
x, that occurs free or bound in a rule from A, a unique &-tuple of variables n(x) € V$, 
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[x] (out, in) b] (out, in) [z] (out, in) 
a HT ~H T ~H T 
D 
Gia Oa O) Sa fi LE LE 7 LEJ E ji 
x y g ) p 
x x z 
(a) (b) 


Fig. 4. Gaifman Heap for a Chain Configuration 


that represents the image of the store value v(x) in a Gaifman heap h i.e., h(v(x)) = 
v(n(x)). Moreover, we consider, for each predicate symbol A € def(A), an annotated 
predicate symbol A, of arity #A, = (R+ 1) -#A, where 1: [1,#A] x [1,M] > 213-1) is 
a map associating each parameter i € [1,#A] and each interaction type Tj, for j € [1,M], 
a set of integers 1(i, j) denoting the positions of the encodings of the interactions of type 
Tj, involving the value of x;, in the models of A, (x1, +++ 5X#A,1(X1),---,1(x#a)) (point 2 
of Definition 13). Then A contains rules of the form: 


AL (X15 ++ H(A) NE) Na) S (6) 
dy)... Ymm (y). -3 NYm) - Yr «KL 1 By he piki 


for which A has a stem rule A(x1,. . . ,X#{a)) — dy... IyYm -Yxn K a B{(zi, tas Za 
where y * 7 is a quantifier- and predicate-free formula and 7 is the conjunction of equal- 
ities and disequalities from y*n. However, not all rules (6) are considered in A, but only 
the ones meeting the following condition: 


Definition 14. A rule of the form (6) is well-formed if and only if, for each i € [1,#A] 
and each j € [1,M], there exists a set of integers Y; ; C [0,8 — 1], such that: 


- |Y; il] = lK.n(xi) , where Karla) is the set of interaction atoms (z1.P1,...,Zn-Pn) 
from W of type Tj = (p1,---,Pn), Such that zs %_ x, for some s € [1,n], 
. ao def € 7 ] 
- Y;j C (i, j) and vi, j)\Y; j = Zj(xi), where Zj(x) = Ufa UKE {U (k, j) |x =n z4} 
is the set of positions used to encode the interactions of type 7; involving the 


store value of the parameter x, in the sub-configuration corresponding to an atom 
l 4 
Bo(zy,..- Z(t) for some £ € [1,hl. 


We denote by A the set of well-formed rules (6), such that, moreover: 


PŽ x n(x) * X xcty(y) CompStatesy(x) * 274) InterAtomsy(x;), where: 
CompStates,, x) = x [x] occurs in y (n(x))1 =x * X x@q, occurs in y (N(x) state(s) =x 

def rj j j j def 
InterAtomsy(x;) = K% K 54 ME) inert) =x} and {kj,...,k/,} = (i )\ Zii) 


Here for two tuples of variables x = (x1,...,x,) and y = (y1,...,Yk)}, we denote by 
x = y the formula * £ ‘Xi = yi. Intuitively, the SL formula Com pStates,,(x) realizes 
the encoding of the component and state atoms from y, in the sense of points (1) and 
(3) from Definition 13, whereas the formula InterAtomsy(x;) realizes the encodings of 
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the interactions involving a parameter x; in the stem rule (point 2 of Definition 13). In 
particular, the definition of InterAtomsy(x;) uses the fact that the rule is well-formed. 

We state below the main result of this section on the complexity of the 
entailment problem. The upper bounds follow from a many-one reduction of 
Entl[A, A,B] to the SL entailment A, (x1,...,x#a,1(x1),---,1(#a)) IFz Sawp yi... does 
An (x#B4 1) ... an (xep) - By(x1,.-. ‘sie tae as Ael in combination with the 
upper bound provided by Theorem 4, for SL entailments. If k < œ, the complexity 
is tight for CL, whereas gaps occur for k = ©, £ < œ and k = œ, £ = œ, due to the cut-off 
on the degree bound (Proposition 1), which impacts the size of A and time needed to 
generate it from A. 


Theorem 5. Jf A is progressing, connected and e-restricted and, moreover, Bnd[A, A] 
has a positive answer, Entl*"{A, A,B] is in 2EXP, Entl*“[A, A,B] is in 3EXP N 2EXP- 
hard, and Entl[A, A,B] is in 4EXP N 2EXP-hard. 


6 Conclusions and Future Work 


We study the satisfiability and entailment problems in a logic used to write proofs of 
correctness for dynamically reconfigurable distributed systems. The logic views the 
components and interactions from the network as resources and reasons also about the 
local states of the components. We reuse existing techniques for Separation Logic [39], 
showing that our configuration logic is more expressive than SL, fact which is confirmed 
by a number of complexity gaps. Closing up these gaps and finding tight complexity 
classes in the more general cases is considered for future work. In particular, we aim 
at lifting the boundedness assumption on the degree of the configurations that must be 
considered to check the validity of entailments. 
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Abstract. We present the Loop Acceleration Tool (LoAT), a powerful 
tool for proving non-termination and worst-case lower bounds for pro- 
grams operating on integers. It is based on the novel calculus from [10,11] 
for loop acceleration, i.e., transforming loops into non-deterministic 
straight-line code, and for finding non-terminating configurations. To 
implement it efficiently, LoAT uses a new approach based on unsat cores. 
We evaluate LoAT’s power and performance by extensive experiments. 


1 Introduction 


Efficiency is one of the most important properties of software. Consequently, 
automated complexity analysis is of high interest to the software verification 
community. Most research in this area has focused on deducing upper bounds on 
the worst-case complexity of programs. In contrast, the Loop Acceleration Tool 
LoAT aims to find performance bugs by deducing lower bounds on the worst-case 
complexity of programs operating on integers. Since non-termination implies the 
lower bound oo, LoAT is also equipped with non-termination techniques. 

LoAT is based on loop acceleration [4,5,9-11,15], which replaces loops by 
non-deterministic code: The resulting program chooses a value n, representing 
the number of loop iterations in the original program. To be sound, suitable 
constraints on n are synthesized to ensure that the original loop allows for at 
least n iterations. Moreover, the transformed program updates the program vari- 
ables to the same values as n iterations of the original loop, but it does so in 
a single step. To achieve that, the loop body is transformed into a closed form, 
which is parameterized in n. In this way, LoAT is able to compute symbolic 
under-approzimations of programs, i.e., every execution path in the resulting 
transformed program corresponds to a path in the original program, but not 
necessarily vice versa. In contrast to many other techniques for computing under- 
approximations, the symbolic approximations of LoAT cover infinitely many runs 
of arbitrary length. 
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Contributions: The main new feature of the novel version of LoAT presented 
in this paper is the integration of the loop acceleration calculus from [10,11], 
which combines different loop acceleration techniques in a modular way, into 
LoAT’s framework. This enables LoAT to use the loop acceleration calculus for 
the analysis of full integer programs, whereas the standalone implementation of 
the calculus from [10,11] was only applicable to single loops without branching 
in the body. To control the application of the calculus, we use a new technique 
based on unsat cores (see Sect.5). The new version of LoAT is evaluated in 
extensive experiments. See [14] for all proofs. 


2 Preliminaries 


Let L D {main} be a finite set of locations, where main is the canonical start 
location (i.e., the entry point of the program), and let g := [a ,..., aq] be the 
vector of program variables. Furthermore, let TV be a countably infinite set of 
temporary variables, which are used to model non-determinism, and let sup Z := 
oo. We call an arithmetic expression e an integer expression if it evaluates to 
an integer when all variables in e are instantiated by integers. LoAT analyzes 
tail-recursive programs operating on integers, represented as integer transition 
systems (ITSs), i.e., sets of transitions f(Z) 7 g(a) |p] where f,g € £, the 
update @ is a vector of d integer expressions over TV U 7, the cost p is either an 
arithmetic expression over TV U Z or oo, and the guard ọ is a conjunction of 
inequations over integer expressions with variables from TV U 7.1 For example, 
consider the loop on the left and the corresponding transition tj..) on the right. 


while z > 0 doz zg-—1 f(x) > f(e — 1) [e > 0] (ttoop) 


Here, the cost 1 instructs LoAT to use the number of loop iterations as cost 
measure. LoAT allows for arbitrary user defined cost measures, since the user 
can choose any polynomials over the program variables as costs. LoAT synthesizes 
transitions with cost co to represent non-terminating runs, i.e., such transitions 
are not allowed in the input. 

A configuration is of the form f(@) with f € £ and Z € Zf. For any entity 
s ¢ £and any arithmetic expressions b= [b1,..., bal, let s(b) denote the result of 
replacing each variable x; in s by b;, for all 1 < i < d. Moreover, Vars(s) denotes 
the program variables and TV(s) denotes the temporary variables occurring in 


s. For an integer transition system J, a configuration f(@) evaluates to g(é’) 


with cost k E€ ZU {oo}, written f(c) Ey g(@’), if there exist a transition f(z) > 


g(@) |y] € T and an instantiation of its temporary variables with integers such 
that the following holds: 


P(E) A E = (E) A k = pl). 


1 LoAT can also analyze the complexity of certain non-tail-recursive programs, see [9]. 
For simplicity, we restrict ourselves to tail-recursive programs in the current paper. 
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al 


As usual, we write f(C) ae g(c’) if f(e) evaluates to g(¢’) in arbitrarily many 
steps, and the sum of the costs of all steps is k. We omit the costs if they are 
irrelevant. The derivation height of f(€) is 


dhr(f(@)) := sup{k | 39(2"). FQ S7 9(2')} 


and the runtime complexity of T is 


rey(n) := sup{dhz(main(cy,...,¢a)) | la| +... + lcal] < n}. 


T terminates if no configuration main(é) admits an infinite +7-sequence and T 
is finitary if no configuration main(c) admits a +-7-sequence with cost oo. Oth- 
erwise, Cis a witness of non-termination or a witness of infinitism, respectively. 
Note that termination implies finitism for ITSs where no transition has cost oo. 
However, our approach may transform non-terminating ITSs into terminating, 
infinitary ITSs, as it replaces non-terminating loops by transitions with cost oo. 


3 Overview of LoAT 


The goal of LoAT is to compute a lower bound on rey or even prove non- 
termination of 7. To this end, it repeatedly applies program simplifications, 
so-called processors. When applying them with a suitable strategy (see [8,9]), 
one eventually obtains simplified transitions of the form main(#) > f(a) [y] 
where f # main. As LoAT’s processors are sound for lower bounds (i.e., if they 
transform T to T’, then dhr > dhr), such a simplified transition gives rise to 
the lower bound I,-p on dhr(main()) (where Ip denotes the indicator function 
of p, which is 1 for values where y holds and 0 otherwise). This bound can be 
lifted to rez by solving a so-called limit problem, see [9]. 

LoAT’s processors are also sound for non-termination, as they preserve 
finitism. So if p = oo, then it suffices to prove satisfiability of p to prove 
infinitism, which implies non-termination of the original ITS, where transitions 
with cost oo are forbidden (see Sect. 2). LoAT’s most important processors are: 


Loop Acceleration (Sect.4) transforms a simple loop, i.e., a single transition 
f(@) > f(@) [e], into a non-deterministic transition that can simulate several 
loop iterations in one step. For example, loop acceleration transforms tjoop 
to 


f(x) > f(a—n) [x >nAn>O0], (ticop*) 


where n € TY, i.e., the value of n can be chosen non-deterministically. 
Instantiation [9, Theorem 3.12] replaces temporary variables by integer expres- 
sions. For example, it could instantiate n with x in toop, resulting in 


f(x) > f(0) [x > 0]. (tioop*) 
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Chaining [9, Theorem 3.18] combines two subsequent transitions into one tran- 
sition. For example, chaining combines the transitions 


main(x) —> f(x) 


and ticope to main(x) — f(0) [x > 0]. 


Nonterm (Sect.6) searches for witnesses of non-termination, characterized by 
a formula w. So it turns, e.g., 


(21,22) = f (x1 — 22,02) [a1 > 0] ane) 
into f(x1,22) 2> sink(a1, £2) [z1 > 0A a2 < 0] 


(where sink € £ is fresh), as each @ € Z? with cy > 0A c < 0 witnesses 
non-termination of tnonterm, 1.€., here w is zı > 0A z2 <0. 


Intuitively, LoAT uses Chaining to transform non-simple loops into simple 
loops. Instantiation resolves non-determinism heuristically and thus reduces 
the number of temporary variables, which is crucial for scalability. In addition 
to these processors, LoAT removes transitions after processing them, as explained 
in [9]. See [8,9] for heuristics and a suitable strategy to apply LoAT’s processors. 


4 Modular Loop Acceleration 


For Loop Acceleration, LoAT uses conditional acceleration techniques [10]. 
Given two formulas € and ¢, and a loop with update @, a conditional acceleration 
technique yields a formula accel(€,%,@) which implies that € holds throughout 
n loop iterations (i.e., € is an n-invariant), provided that ¢ is an n-invariant, 
too. In the following, let @(Z) := # and @+1(z) := @(@™"(Z)) = a[z/a" (2)]. 


Definition 1 (Conditional Acceleration Technique). A function accel is 
a conditional acceleration technique if the following implication holds for all 
formulas € and % with variables from TV U g, all updates d, all n > 0, and all 
instantiations of the variables with integers: 


(accel(€, B,@) A Vi € [0,n). G(a(Z))) => Vi € [0,n). €(@'(Z)). 

The prerequisite Vi € [0,n). G(a@'(#)) is ensured by previous acceleration 

steps, i.e., % is initially T (true), and it is refined by conjoining a part € of the 

loop guard in each acceleration step. When formalizing acceleration techniques, 

we only specify the result of accel for certain arguments €, %, and @, and assume 
accel (£, %,@) = L (false) otherwise. 
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Definition 2 (LoAT’s Conditional Acceleration Techniques [10,11]). 
Increase accelinc(&,, d) := E ifE ENG => ia) 
Decrease accel dec (€, 3, @) := E(@" 1 (Z)) if FEM AG => E 
Eventual Decrease accel ey-dec(t > 0,6,@) := t > 0 A t(a@"-1(#)) > 0 

FE CSHDAV) = (>) 
Eventual Increase accel ey-inc(t > 0,9, da) :=t >O0At < t(a) 

FE SEDAP) => a) <M) 
Fixpoint accel fp(t > 0, %, 4) := t > OA Azectosureg(t) © = @(@) 

where closureg(t) := Uen Vars(t(a' (£))) 


The above five techniques are taken from [10,11], where only deterministic 
loops are considered (i.e., there are no temporary variables). Lifting them to 
non-deterministic loops in a way that allows for exact conditional acceleration 
techniques (which capture all possible program runs) is non-trivial and beyond 
the scope of this paper. Thus, we sacrifice exactness and treat temporary vari- 
ables like additional constant program variables whose update is the identity, 
resulting in a sound under-approximation (that captures a subset of all possible 
runs). 

So essentially, Increase and Decrease handle inequations t > 0 in the loop 
guard where t increases or decreases (weakly) monotonically when applying the 
loop’s update. The canonical examples where Increase or Decrease applies are 


f(z,...)— f(@+1,...)[e>0A...] or f(a,...) > f(a—-1,...) [2 >0A...], 


respectively. Eventual Decrease applies if t never increases again once it 
starts to decrease. The canonical example is f(z,y,...) fia +yy 
1,...)[c >0A...]. Similarly, Eventual Increase applies if t never decreases 
again once it starts to increase. Fixpoint can be used for inequations t > 0 that 
do not behave (eventually) monotonically. It should only be used if accel f(t > 
0, %, @) is satisfiable. 

LoAT uses the acceleration calculus of [10]. It operates on acceleration prob- 
lems [4 | Ž | Plz, where w (which is initially T) is repeatedly refined. When it 
stops, % is used as the guard of the resulting accelerated transition. The formulas 
ğ and @ are the parts of the loop guard that have already or have not yet been 
handled, respectively. So ¢ is initially T, and @ and d@ are initialized with the 
guard ọ and the update of the loop f(#) > f(@) [p] under consideration, i.e., the 
initial acceleration problem is [T | T | y];. Once @ is T, the loop is accelerated 
to f(z) & f(a" (2) [Y An > 0], where the cost q and a closed form for &” (7) are 
computed by the recurrence solver PURRS [2]. 


Definition 3 (Acceleration Calculus for Conjunctive Loops). The rela- 
tion ~ on acceleration problems is defined as 


accel(€, P, @) = wr accel is a conditional 
[ei | SIL EA Gg ~ [iA v2 | AAE l Elz acceleration technique 
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So to accelerate a loop, one picks a not yet handled part € of the guard in 
each step. When accelerating f(z) — f(a) [€] using a conditional acceleration 
technique accel, one may assume Vi € [0,n). G(a@'(#)). The result of accel is 
conjoined to the result pı computed so far, and € is moved from the third to 
the second component of the problem, i.e., to the already handled part of the 
guard. 


Example 4 (Acceleration Calculus). We show how to accelerate the loop 


f(t,y) > f(x- yy) >0^Ay>0] to 
f(x, y) Sle AR 


The closed form @” (x) = (x — n - y, y) can be computed via recurrence solving. 
Similarly, the cost (x + #) -n — # - n? of n loop iterations is obtained by solving 
the following recurrence relation (where c™®) and a‘ denote the cost and the 


value of x after n applications of the transition, respectively). 


f(a—n-y,y)[y>OAr—(n—-1)-y>0An>O]. 


=D eg Do Dig-(n-1)-y and csr. 
The guard is computed as follows: 


[TI T|z>0Ay > 0] ~> ly 2 0|y>0]|z> 0z 
~ ly > 0Az-—(n-1)-y>0|y>0Az>0]|T]z- 
In the 1% step, we have € = (y > 0) and accelinc(y > 0, T, ©) = 


(y 
2"4 step, we have € = (x > 0) and accel ge.(x > 0,y > 0,8) = (x— (n 
So the inequation x — (n — 1) - y > 0 ensures n-invariance of x > 0. 


> 0). In the 
—1)-y > 0). 


5 Efficient Loop Acceleration Using Unsat Cores 


Each attempt to apply a conditional acceleration technique other than Fix- 
point requires proving an implication, which is implemented via SMT solv- 
ing by proving unsatisfiability of its negation. For Fixpoint, satisfiability of 
accel (t > 0, ğ,@) is checked via SMT. So even though LoAT restricts € to 
atoms, up to O(m?) attempts to apply a conditional acceleration technique are 
required to accelerate a loop whose guard contains m inequations using a naive 
strategy (5-m attempts for the 1% ~»-step, 5- (m— 1) attempts for the 2” step, 

To improve efficiency, LoAT uses a novel encoding that requires just 5- m 
attempts. For any a € ATimp = {inc, dec, ev-dec, ev-inc}, let encodea(€, ¢, @) 
be the implication that has to be valid in order to apply accela, whose premise is 
of the form ...A@. Instead of repeatedly refining Žž, LoAT tries to prove validity? 
of encodea g := encodea(€, y \ {E}, @) for each a € ATimp and each € € y, where 
y is the (conjunctive) guard of the transition that should be accelerated. Again, 


? Here and in the following, we unify conjunctions of atoms with sets of atoms. 
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proving validity of an implication is equivalent to proving unsatisfiability of its 
negation. So if validity of encode, can be shown, then SMT solvers can also 
provide an unsat core for =encodea g. 


Definition 5 (Unsat Core). Given a conjunction p, we call each unsatisfiable 
subset of w an unsat core of w. 


Theorem 6 shows that when handling an inequation €, one only has to require 
n-invariance for the elements of y\{€} that occur in an unsat core of sencodeg ¢. 
Thus, an unsat core of sencode,,¢ can be used to determine which prerequisites 
are needed for the inequation €. This information can then be used to find a 
suitable order for handling the inequations of the guard. Thus, in this way one 
only has to check (un)satisfiability of the 4-m formulas sencodeq,¢. If no such 
order is found, then LoAT either fails to accelerate the loop under consideration, 
or it resorts to using Fixpoint, as discussed below. 


Theorem 6 (Unsat Core Induces ~-Step). Let deps, g be the intersec- 
tion of p \ {E} and an unsat core of ~nencodea g. If ~ implies deps, ¢, then 
accela(€, Q, a) = accela(€, e \ {é}, a). 


Example 7 (Controlling Acceleration Steps via Unsat Cores). Reconsider Exam- 
ple 4. Here, LoAT would try to prove, among others, the following implications: 


encodége 230 = (2-y>OAy>0) = r>0 (1) 
encodé€incyso = (y>OAr>0) = y>0 (2) 


To do so, it would try to prove unsatisfiability of sencodeg,¢ via SMT. For (1), 
we get rencode dec,zs0 = (£ — y >OAY > OA < 0), whose only unsat core is 
mencode decz>0, and its intersection with y \ {x > 0} = {y > 0} is {y > 0}. 

For (2), we get sencode inc,yso = (y > OAx > OAy < 0), whose minimal unsat 
core is y > OA y < 0, and its intersection with vy \ {y > 0} = {x > 0} is empty. 
So by Theorem 6, we have accelinc(y > 0,T,@) = acceline(y > 0,2 > 0, a). 

In this way, validity of encodeg,,2+0 and encodég,,y>0 is proven for all a, € 
ATimp \ {inc} and all ag E€ ATimp. However, the premise x < a—yAy > 0 
of encode ey-inc,z>0 18 unsatisfiable and thus a corresponding acceleration step 
would yield a transition with unsatisfiable guard. To prevent that, LoAT only 
uses a technique a € AT;imp for € if the premise of encodea g is satisfiable. 


So for each inequation € from y, LoAT synthesizes up to 4 potential ~»-steps 
corresponding to accela(€, deps, ¢,@), where a € ATimp. If validity of encodea,g 
cannot be shown for any a € ATimp, then LoAT tries to prove satisfiability of 
accel fp (€, T, @) to see if Fixpoint should be applied. Note that the 2”? argument 
of accel, is irrelevant, i.e., Fixpoint does not benefit from previous acceleration 
steps and thus ~»-steps that use it do not have any dependencies. 

It remains to find a suitably ordered subset $ of m ~»-steps that constitutes 
a successful ~»-sequence. In the following, we define AT := AT jm, U {fp} and we 
extend the definition of deps, ¢ to the case a = fp by defining deps pp ¢ := Ø. 
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Lemma 8. Let C C AT x ọ be the smallest set such that (a,€) € C implies 
(a) if a E€ AT imp, then encodea g is valid and its premise is satisfiable, 

(b) if a= fp, then accel fp(€, T,@) is satisfiable, and 

(c) depsa g C {E | (a’,€') E€ C for some a! € AT}. 

Let S := {(a,€) € C | a >ar a’ for all (a’,€) € C} where >ar is the total 


order inc >ar dec >ar ev-dec >ar ev-inc >ar fp. We define (a’,&’) < (a, £) 
if €' € depsa ¢. Then < is a strict (and hence, well-founded) order on S. 


The order >ar in Lemma 8 corresponds to the order proposed in [10]. Note 
that the set C can be computed without further (potentially expensive) SMT 
queries by a straightforward fixpoint iteration, and well-foundedness of < follows 
from minimality of C. For Example 7, we get 


C = { (dec, x > 0), (ev-dec, x > 0)} U {(a,y > 0) | a € AT} and 
S = { (dec, x > 0), (inc, y > 0)} with (inc, y > 0) < (dec, x > 0). 
Finally, we can construct a valid ~~-sequence via the following theorem. 


Theorem 9. (Finding ~»-Sequences). Let S be defined as in Lemma 8 and 
assume that for each € € y, there is an a € AT such that (a,€) € S. W.L.o.g., 
let p = A™, & where (a1,€1) <'... < (am, Em) for some strict total order <' 
containing <, and let 6; := Ni ĉi- Then for all j € [0,m), we have: 

[Na accel, (i, Pi-1, ©) | Pj | Ni=j+i é| wt a accel, (ĉi, Gi-1, @) 


a 


Pir | Nizi+2 él 

In our example, we have <’ = < as < is total. Thus, we obtain a ~- 
sequence by first processing y > 0 with Increase and then processing x > 0 
with Decrease. 


6 Proving Non-Termination of Simple Loops 


To prove non-termination, LoAT uses a variation of the calculus from Sect. 4, 
see [11]. To adapt it for proving non-termination, further restrictions have to be 
imposed on the conditional acceleration techniques, resulting in the notion of 
conditional non-termination techniques, see [11, Def. 10]. We denote a ~»-step 
that uses a conditional non-termination technique with ~nt- 


Theorem 10. (Proving Non-Termination via ~p). Let f(Z) — f(a) |y] € 
T. f(T | T | glz ee [V || Tq, then for every €€ Zt where (2) is satisfi- 
able, the configuration f(€) admits an infinite >7-sequence. 


The conditional non-termination techniques used by LoAT are Increase, 
Eventual Increase, and Fixpoint. So non-termination proofs can be synthe- 
sized while trying to accelerate a loop with very little overhead. After successfully 
accelerating a loop as explained in Sect.5, LoAT tries to find a second suitably 
ordered ~»-sequence, where it only considers the conditional non-termination 
techniques mentioned above. If LoAT succeeds, then it has found a ~,,4-sequence 
which gives rise to a proof of non-termination via Theorem 10. 
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7 Implementation, Experiments, and Conclusion 


Our implementation in LoAT can parse three widely used formats for ITSs (see 
[13]), and it is configurable via a minimalistic set of command-line options: 


--timeout to set a timeout in seconds 

--proof-level to set the verbosity of the proof output 

--plain to switch from colored to monochrome proof-output 
--limit-strategy to choose a strategy for solving limit problems, see [9] 
--mode to choose an analysis mode for LoAT (complexity or non_termination) 


We evaluate three versions of LoAT: LoAT '19 uses templates to find invari- 
ants that facilitate loop acceleration for proving non-termination [8]; LoAT '20 
deduces worst-case lower bounds based on loop acceleration via metering func- 
tions [9]; and LoAT '22 applies the calculus from [10,11] as described in Sect. 5 
and 6. We also include three other state-of-the-art termination tools in our eval- 
uation: T2 [6], VeryMax [16], and iRankFinder [3,7]. Regarding complexity, the 
only other tool for worst-case lower bounds of ITSs is LOBER [1]. However, we 
do not compare with LOBER, as it only analyses (multi-path) loops instead of 
full ITSs. 

We use the examples from the categories Termination (1222 examples) and 
Complexity of ITSs (781 examples), respectively, of the Termination Problems 
Data Base [19]. All benchmarks have been performed on StarEzxec [18] (Intel 
Xeon E5-2609, 2.40GHz, 264GB RAM [17]) with a wall clock timeout of 300s. 


LoAT ‘22 
No|Yes|Avg. Rt}/Median Rt|Std. Dev. Rt rer (n) [OANNAAM EXPR) 
LoAT '22 [|493| 0 9.4 0.2 41.5 (1) 180 | 63 1 — = 12 
LoAT '19 |/459] O | 22.6 1.5 67.5 e N(n) 6 |218] 3 — = = 
T2 438/610) 22.6 12 66.7 mA | = 1 69 — — — 
VeryMax ||419|628| 29.9 1.0 66.7 <x Ra — | = = 7 = = 
iRankFinder|[399|634| 44.1 4.9 89.1 2| EXP 1 z = = 4 = 
Nw) 216 


By the table on the left, LoAT '22 is the most powerful tool for non- 
termination. The improvement over LoAT '19 demonstrates that the calculus 
from [10,11] is more powerful and efficient than the approach from [8]. The last 
three columns show the average, the median, and the standard deviation of the 
wall clock runtime, including examples where the timeout was reached. 

The table on the right shows the results for complexity. The diagonal cor- 
responds to examples where LoAT '20 and LoAT '22 yield the same result. The 
entries above or below the diagonal correspond to examples where LoAT '22 or 
LoAT '20 is better, respectively. There are 8 regressions and 79 improvements, 
so the calculus from [10,11] used by LoAT '22 is also beneficial for lower bounds. 

LoAT is open source and its source code is available on GitHub [12]. See 
[13,14] for details on our evaluation, related work, all proofs, and a pre-compiled 
binary. 
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Abstract. Definition packages in theorem provers provide users with 
means of defining and organizing concepts of interest. This system 
description presents a new definition package for the hybrid systems the- 
orem prover KeYmaera X based on differential dynamic logic (dL). The 
package adds KeYmaera X support for user-defined smooth functions 
whose graphs can be implicitly characterized by dL formulas. Notably, 
this makes it possible to implicitly characterize functions, such as the 
exponential and trigonometric functions, as solutions of differential equa- 
tions and then prove properties of those functions using dL’s differ- 
ential equation reasoning principles. Trustworthiness of the package is 
achieved by minimally extending KeYmaera X’s soundness-critical ker- 
nel with a single axiom scheme that expands function occurrences with 
their implicit characterization. Users are provided with a high-level inter- 
face for defining functions and non-soundness-critical tactics that auto- 
mate low-level reasoning over implicit characterizations in hybrid system 
proofs. 


Keywords: Definitions - Differential dynamic logic - Verification of 
hybrid systems - Theorem proving 


1 Introduction 


KeYmaera X [7] is a theorem prover implementing differential dynamic logic 
dL [17,19-21] for specifying and verifying properties of hybrid systems mixing 
discrete dynamics and differential equations. Definitions enable users to express 
complex theorem statements in concise terms, e.g., by modularizing hybrid sys- 
tem models and their proofs [14]. Prior to this work, KeYmaera X had only one 
mechanism for definition, namely, non-recursive abbreviations via uniform sub- 
stitution [14,20]. This restriction meant that common and useful functions, e.g., 
the trigonometric and exponential functions, could not be directly used in KeY- 
maera X, even though they can be uniquely characterized by dL formulas [17]. 
This system description introduces a new KeYmaera X definitional mecha- 
nism where functions are implicitly defined in dL as solutions of ordinary dif- 
ferential equations (ODEs). Although definition packages are available in most 
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general-purpose proof assistants, our package is novel in tackling the question 
of how best to support user-defined functions in the domain-specific setting for 
hybrid systems. In contrast to tools with builtin support for some fixed subsets 
of special functions [1,9,23]; or higher-order logics that can work with functions 
via their infinitary series expansions [4], e.g., exp(t) = Xo ti our package 
strikes a balance between practicality and generality by allowing users to define 
and reason about any function characterizable in dL as the solution of an ODE 
(Sect. 2), e.g., exp(t) solves the ODE e’ = e with initial value e(0) = 1. 

Theoretically, implicit definitions strictly expand the class of ODE invariants 
amenable to dL’s complete ODE invariance proof principles [22]; such invariants 
play a key role in ODE safety proofs |21] (see Proposition 3). In practice, arith- 
metical identities and other specifications involving user-defined functions are 
proved by automatically unfolding their implicit ODE characterizations and re- 
using existing KeYmaera X support for ODE reasoning (Sect. 3). The package is 
designed to provide seamless integration of implicit definitions in KeYmaera X 
and its usability is demonstrated on several hybrid system verification examples 
drawn from the literature that involve special functions (Sect. 4). 

All proofs are in the supplement [8]. The definitions package is part of KeY- 
maera X with a usage guide at: http://keymaeraX.org/keymaeraXfunc/. 


2 Interpreted Functions in Differential Dynamic Logic 


This section briefly recalls differential dynamic logic (dL) [17,18,20,21] and 
explains how its term language is extended to support implicit function defi- 
nitions. 


Syntax. Terms e,č and formulas ¢,~ in dL are generated by the following 

grammar, with variable x, rational constant c, k-ary function symbols h (for any 

k € N), comparison operator ~ € {=,4,>,>,<,<}, and hybrid program a: 
e,é:=a|cleté|e-é| h(ei,...,ex) (1) 


dpuseré| pry | eVe |= | Vad | Ard | la] | (a) ¢ (2) 


The terms and formulas above extend the first-order language of real arith- 
metic (FOLg) with the box ([a] ¢) and diamond ((a) ¢) modality formulas which 
express that all or some runs of hybrid program a satisfy postcondition ¢, respec- 
tively. Table 1 gives an intuitive overview of dL’s hybrid programs language for 
modeling systems featuring discrete and continuous dynamics and their inter- 
actions thereof. In dL’s uniform substitution calculus, function symbols h are 
uninterpreted, i.e., they semantically correspond to an arbitrary (smooth) func- 
tion. Such uninterpreted function symbols (along with uninterpreted predicate 
and program symbols) are crucially used to give a parsimonious axiomatiza- 
tion of dL based on uniform substitution [20] which, in turn, enables a trust- 
worthy microkernel implementation of the logic in the theorem prover KeY- 
maera X [7,16]. 
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Table 1. Syntax and informal semantics of hybrid programs 


Program Behavior 

2d Stay in the current state if ¢ is true, otherwise abort and discard run 
wi=e Store the value of term e in variable x 

x£ := Store an arbitrary real value in variable x 

x’ = f(x) &Q Continuously follow ODE a’ = f(x) in domain Q for any duration >0 
if(¢)a Run program a if ¢ is true, otherwise skip. Definable by ?¢;aU ?7d 
Q; Run program a, then run program £ in any resulting state(s) 

alps Nondeterministically run either program a or program 8 

a* Nondeterministically repeat program a for n iterations, for any n € N 
{a} For readability, braces are used to group and delimit hybrid programs 


Hybrid program model (auxiliary variables s, c): 


S:=*C:= *; ?@sin (S, 0); ?cos(c, 0); 


{0 =w,w' = =e —kw,s' =we,c’ = —ws} 


Hybrid program model (trigonometric functions): 


pimwit (5 -p< S cost) (wzw -ph 


{9 =w,w' = 2 sin(0) — kw} 


Qs = 


dL safety specification: 


T 


Ps ZEg>0AL>0Ak>0A0=0Aw =0 > [as] |b| < 5 


Fig. 1. Running example of a swinging pendulum driven by an external force (left), its 
hybrid program models and dL safety specification (right). Program a, uses trigono- 
metric functions directly, while program âs uses variables s,c to implicitly track the 
values of sin(@) and cos(@), respectively (additions in red). The implicit characteriza- 
tions dsin(s, 0), dcos(c,@) are defined in (4), (5) and are not repeated here for brevity. 
(Color figure online) 


Running Example. Adequate modeling of hybrid systems often requires the 
use of interpreted function symbols that denote specific functions of interest. 
As a running example, consider the swinging pendulum shown in Fig. 1. The 
ODEs describing its continuous motion are 6’ = w,w' = — 4 sin(@) — kw, where 
0 is the swing angle, w is the angular velocity, and g,k,L are the gravita- 
tional constant, coefficient of friction, and length of the rigid rod suspending 
the pendulum, respectively. The hybrid program a, models an external force 
that repeatedly pushes the pendulum and changes its angular velocity by a 
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nondeterministically chosen value p; the guard if(...) condition is designed to 
ensure that the push does not cause the pendulum to swing above the horizontal 
as specified by ¢,. Importantly, the function symbols sin, cos must denote the 
usual real trigonometric functions in a,. Program âs shows the same pendulum 
modeled in dL without the use of interpreted symbols, but instead using aux- 
iliary variables s,c. Note that âs is cumbersome and subtle to get right: the 
implicit characterizations dsin(s,@), cos(c, 0) from (4), (5) are lengthy and the 
differential equations s’ = we, d = —ws must be manually calculated and added 
to ensure that s,c correctly track the trigonometric functions as 0 evolves con- 
tinuously [18,22]. 


Interpreted Functions. To enable extensible use of interpreted functions in 
dL, the term grammar (1) is enriched with k-ary function symbols h that carry 
an interpretation annotation [5,27], heg», where ¢ = $(%0,y1,---, Ye) is a 
dL formula with free variables in x9, y1,...,y, and no uninterpreted symbols. 
Intuitively, ¢ is a formula that characterizes the graph of the intended interpre- 
tation for h, where y1,..., Yẹ are inputs to the function and zo is the output. 
Since ¢ depends only on the values of its free variables, its formula semantics [¢] 
can be equivalently viewed as a subset of Euclidean space [¢] C R x R* [20,21]. 
The dL term semantics v[e] [20,21] in a state v is extended with a case for terms 
heo>(é1,---,€k) by evaluation of the smooth C° function characterized by [¢]: 


Wiss Cece = i ...,Vfex]) if lẹ] sph of smooth h:R*—>R 
0 otherwise 

This semantics says that, if the relation [¢] C R x R* is the graph of some 
smooth C%® function h : Rë > R, then the annotated syntactic symbol hegs 
is interpreted semantically as h. Note that the graph relation uniquely defines 
h (if it exists). Otherwise, h&g» is interpreted as the constant zero function 
which ensures that the term semantics remain well-defined for all terms. An 
alternative is to leave the semantics of some terms (possibly) undefined, but 
this would require more extensive changes to the semantics of dL and extra case 
distinctions during proofs [2]. 


Axiomatics and Differentially-Defined Functions. To support reasoning 
for implicit definitions, annotated interpretations are reified to characterization 
axioms for expanding interpreted functions in the following lemma. 


Lemma 1. (Function interpretation). The FI axiom (below) for dL is 
sound where h is a k-ary function symbol and the formula semantics [@] is 
the graph of a smooth C® function h : R? > R. 


FI e9 =hegs(é1,---,€k)  O(€0, €1,---, €k) 


Axiom FI enables reasoning for terms h<egs(e1,...,e%) through their 
implicit interpretation ¢, but Lemma 1 does not directly yield an implementation 
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because it has a soundness-critical side condition that interpretation @ charac- 
terizes the graph of a smooth C% function. It is possible to syntactically char- 
acterize this side condition [2], e.g., the formula Vy,,..., y,dxo¢(20, Y1,---5 Ye) 
expresses that the graph represented by ¢ has at least one output value xo for 
each input value y1,..., Yk, but this burdens users with the task of proving this 
side condition in dL before working with their desired function. The KeYmaera X 
definition package opts for a middle ground between generality and ease-of-use by 
implementing FI for univariate, differentially-defined functions, i.e., the interpre- 
tation ¢ has the following shape, where x = (xo, %1,..., £n) abbreviates a vector 
of variables, there is one input t = y1, and X = (Xo, X1,...,Xn), T are dL terms 
that do not mention any free variables, e.g., are rational constants, which have 
constant value in any dL state: 


(0, t) = (Bipini = sdi are e o = ‘ ” 


Formula (3) says from point xo, there exists a choice of the remaining coor- 
dinates z1,...,%n such that it is possible to follow the defining ODE either 
forward x’ = f(x,t), t = 1 or backward x’ = —f(x,t),t’ = —1 in time to reach 
the initial values x = X at time t = T. In other words, the implicitly defined 
function h<¢(xo,t)>> is the x9-coordinate projected solution of the ODE starting 
from initial values X at initial time T. For example, the trigonometric functions 
used in Fig. 1 are differentially-definable as respective projections: 


= s' Ge s,t' 1U s=O0Ac=1A 
Psin(s, t) = (c= a c, c s,t! 1 D & =, (4) 


= 3! ie s,t' 1U s=0Ac=1A 
Poolat) = (== *5 { s! C; c s, t 1 D =0 ) g 


By Picard-Lindelöf [21, Thm. 2.2], the ODE x’ = f(x, t) has a unique solution 
® : (a,b) — R”+! on an open interval (a,b) for some —co < a < b < œ. 
Moreover, ®(t) is C smooth in t because the ODE right-hand sides are dL terms 
with smooth interpretations [20]. Therefore, the side condition for Lemma 1 
reduces to showing that ® exists globally, i.e., it is defined on t € (—oo, 00). 


Lemma 2. (Smooth interpretation). If formula dao ¢(x0, t) is valid, b(xo,t) 
from (3) characterizes a smooth C® function and aziom FI is sound for (a0, t). 


Lemma 2 enables an implementation of axiom FI in KeYmaera X that com- 
bines a syntactic check (the interpretation has the shape of formula (3)) and a 
side condition check (requiring users to prove existence for their interpretations). 

The addition of differentially-defined functions to dL strictly increases the 
deductive power of ODE invariants, a key tool in deductive ODE safety reason- 
ing [21]. Intuitively, the added functions allow direct, syntactic descriptions of 
invariants, e.g., the exponential or trigonometric functions, that have effective 
invariance proofs using dL’s complete ODE invariance reasoning principles [22]. 
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Proposition 3. (Invariant expressivity). There are valid polynomial dL dif- 
ferential equation safety properties which are provable using differentially-defined 
function invariants but are not provable using polynomial invariants. 


3 KeYmaera X Implementation 


The implicit definition package adds interpretation annotations and axiom FI 
based on Lemma 2 in +170 lines of code extensions to Ke Ymaera X’s soundness- 
critical core [7,16]. This section focuses on non-soundness-critical usability fea- 
tures provided by the package that build on those core changes. 


3.1 Core-Adjacent Changes 


KeYmaera X has a browser-based user interface with concrete, ASCII-based 
dL syntax [14]. The package extends KeYmaera X’s parsers and pretty printers 
with support for interpretation annotations h«...»(...) and users can simulta- 
neously define a family of functions as respective coordinate projections of the 
solution of an n-dimensional ODE (given initial conditions) with sugared syntax: 


implicit Real hl(Real t), ..., hn(Real t) = {{initcond}; {ODE}} 


For example, the implicit definitions (4), (5) can be written with the following 
sugared syntax; KeYmaera X automatically inserts the associated interpretation 
annotations for the trigonometric function symbols, see the supplement [8] for a 
KeYmaera X snippet of formula s, from Fig. 1 using this sugared definition. 


implicit Real sin(Real t), cos(Real t) 
= {{sin:=0; cos:=1;}; {sin’=cos, cos’=-sin}} 


In fact, the functions sin, cos,exp are so ubiquitous in hybrid system models 
that the package builds their definitions in automatically without requiring users 
to write them explicitly. In addition, although arithmetic involving those func- 
tions is undecidable [11,24], KeYmaera X can export those functions whenever 
its external arithmetic tools have partial arithmetic support for those functions. 


3.2 Intermediate and User-Level Proof Automation 


The package automatically proves three important lemmas about user-defined 
functions that can be transparently re-used in all subsequent proofs: 


1. It proves the side condition of axiom FI using KeYmaera X’s automation 
for proving sufficient duration existence of solutions for ODEs [26] which 
automatically shows global existence of solutions for all affine ODEs and 
some univariate nonlinear ODEs. As an example of the latter, the hyperbolic 
tanh function is differentially-defined as the solution of ODE x’ = 1 — x? with 
initial value x = 0 at t = 0 whose global existence is proved automatically. 
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2. It proves that the functions have initial values as specified by their interpre- 
tation, e.g., sin(0) = 0, cos(0) = 1, and tanh(0) = 0. 

3. It proves the differential axiom |20] for each function that is used to enable 
syntactic derivative calculations in dL, e.g., the differential axioms for sin, cos 
are (sin(e))’ = cos(e)(e)’ and (cos(e))’ = —sin(e)(e)’, respectively. Briefly, 
these axioms are automatically derived in a correct-by-construction manner 
using dL’s syntactic version of the chain rule for differentials [20, Fig. 3], so 
the rate of change of sin(e) is the rate of change of sin(-) with respect to its 
argument e, multiplied by the rate of change of its argument (e)’. 


These lemmas enable the use of differentially-defined functions with all exist- 
ing ODE automation in KeYmaera X [22,26]. In particular, since differentially- 
defined functions are univariate Noetherian functions, they admit complete ODE 
invariance reasoning principles in dL [22] as implemented in KeYmaera X. 

The package also adds specialized support for arithmetical reasoning over 
differential definitions to supplement external arithmetic tools in proofs. First, 
it allows users to manually prove identities and bounds using KeYmaera X’s 
ODE reasoning. For example, the bound tanh(Ax)? < 1 used in the example ay, 
from Sect. 4 is proved by differential unfolding as follows (see supplement [8]): 


H tanh(0)? <1 tanh(Av)?<1b [{o! = 1&v <a} U {v =-1&v > z} tanh(Av)?<1 
H tanh(Ar)?2 < 1 


This deduction step says that, to show the conclusion (below rule bar), it 
suffices to prove the premises (above rule bar), i.e., the bound is true at v = 0 
(left premise) and it is preserved as v is evolved forward v’ = 1 or backward 
v’ = —1 along the real line until it reaches x (right premise). The left premise is 
proved using the initial value lemma for tanh while the right premise is proved 
by ODE invariance reasoning with the differential axiom for tanh [22]. 

Second, the package uses KeYmaera X’s uniform substitution mechanism [20] 
to implement (untrusted) abstraction of functions with fresh variables when 
solving arithmetic subgoals, e.g., the following arithmetic bound for example a, 
is proved by abstraction after adding the bounds tanh(Az)? < 1, tanh(Ay)? < 1. 


Bound: x(tanh(Az) — tanh(Ay)) + y(tanh(Az) + tanh(Ay)) < 2V x? + y? 


Abstracted: t <1At; <1— 2(ts —ty) + y(te + ty) < 2/2? +y? 


4 Examples 


The definition package enables users to work with differentially-defined functions 
in KeYmaera X, including modeling and expressing their design intuitions in 
proofs. This section applies the package to verify various continuous and hybrid 
system examples from the literature featuring such functions. 


Discretely Driven Pendulum. The specification ¢, from Fig. 1 contains a discrete 
loop whose safety property is proved by a loop invariant, i.e., a formula that is 
preserved by the discrete and continuous dynamics in each loop iteration [21]. 
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tw? < 2, which expresses that the 


The key invariant is Inv = £(1 — cos0) + 5 ae 
total energy of the system (sum of potential and kinetic energy on the LHS) is 
less than the energy needed to cross the horizontal (RHS). The main steps are 


as follows (proofs for these steps are automated by KeYmaera X): 


1. Inv > [if ($(w—p)? < £cos(0)) {w:=w —p}] Inv, which shows that the 
discrete guard only allows push p if it preserves the energy invariant, and 

2. Inv > [{9 = w,w" = —£sin(6) — kw}] Inv, which shows that Inv is an energy 
invariant of the pendulum’s ODE. 


Neuron Interaction. The ODE a, models the interaction between a pair of neu- 
rons [12]; its specification ¢, nests dL’s diamond and box modalities to express 


that the system norm (,/x? + y?) is asymptotically bounded by 2r. 


a r =-2+ tanh(Ax) — tanh(Ay), y’ = ee tanh(Ax) + tanh(Ay) 
T 


oT 
n =T>0—Ve>0(an) [an] Vr? +y? <2r+e 


The verification of ¢, uses differentially-defined functions in concert with 
KeYmaera X’s symbolic ODE safety and liveness reasoning [26]. The proof uses 
a decaying exponential bound \/x? + y? < exp(—£),/a + yg +27(1—exp(—£)), 
where the constants £o, yo are symbolic initial values for x, y at initial time t = 0, 
respectively. Notably, the arithmetic subgoals from this example are all proved 
using abstraction and differential unfolding (Sect. 3) without relying on external 
arithmetic solver support for tanh. 


Longitudinal Flight Dynamics. The differential equa- 
tions a, below describe the 6th order longitudinal 
motion of an airplane while climbing or descend- 
ing [10,25]. The airplane adjusts its pitch angle 0 
with pitch rate q, which determines its axial veloc- 
ity u and vertical velocity w, and, in turn, range x 
and altitude z (illustrated on the right). The physical parameters are: gravity g, 
mass m, aerodynamic thrust and moment M along the lateral axis, aerodynamic 
and thrust forces X, Z along x and z, respectively, and the moment of inertia 
Iyy, see [10, Sect. 6.2]. 


a eee ee if gent Poa ee 
m m Lyy 
x’ = cos(#)u + sin(0)w, z’ = —sin(0)u + cos(0)w, =q 


The verification of specification J — [œa] J shows that the safety envelope 
J = J, A J2 ^ J3 is invariant along the flow of a, with algebraic invariants J;: 


h= M; + gO 4 (= aw) cos(0) + (2 + au) sin(@) = 0 
m m 


Iyy 
Jh = Me (2 + au) cos(0) + (= — w) sin(0)=0 J3=—-q°4 a 0 
Lyy m m Lyy 


Additional examples are available in the supplement [8], including: a bouncing 
ball on a sinusoidal surface [6,13] and a robot collision avoidance model [15]. 
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5 Conclusion 


This work presents a convenient mechanism for extending the dL term language 
with differentially-defined functions, thereby furthering the class of real-world 
systems amenable to modeling and formalization in KeYmaera X. Minimal 
soundness-critical changes are made to the KeYmaera X kernel, which main- 
tains its trustworthiness while allowing the use of newly defined functions in 
concert with all existing dL hybrid systems reasoning principles implemented in 
KeYmaera X. Future work could formally verify these kernel changes by extend- 
ing the existing formalization of dL [3]. Further integration of external arithmetic 
tools [1,9,23] will also help to broaden the classes of arithmetic sub-problems 
that can be solved effectively in hybrid systems proofs. 


Acknowledgments. We thank the anonymous reviewers for their helpful feedback 
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Abstract. There exist several results on deciding termination and com- 
puting runtime bounds for triangular weakly non-linear loops (twn-loops). 
We show how to use results on such subclasses of programs where com- 
plexity bounds are computable within incomplete approaches for com- 
plexity analysis of full integer programs. To this end, we present a novel 
modular approach which computes local runtime bounds for subpro- 
grams which can be transformed into twn-loops. These local runtime 
bounds are then lifted to global runtime bounds for the whole program. 
The power of our approach is shown by our implementation in the tool 
KoAT which analyzes complexity of programs where all other state-of- 
the-art tools fail. 


1 Introduction 


Most approaches for automated complexity analysis of programs are based 
on incomplete techniques like ranking functions (see, e.g., [1-4,6,11,12,18, 
20,21,31]). However, there also exist numerous results on subclasses of pro- 
grams where questions concerning termination or complexity are decidable, e.g., 
[5, 14,15, 19,22, 24,25, 32,34]. In this work we consider the subclass of triangular 
weakly non-linear loops (twn-loops), where there exist complete techniques for 
analyzing termination and runtime complexity (we discuss the “completeness” 
and decidability of these techniques below). An example for a twn-loop is: 


while (x?+z3 < £3 A x, #0) do (x1, 22,23) — (—2-a1, 3-r2—2-43, 23) (1) 


Its guard is a propositional formula over (possibly non-linear) polynomial inequa- 
tions. The update is weakly non-linear, i.e., no variable x; occurs non-linear in its 
own update. Furthermore, it is triangular, i.e., we can order the variables such 
that the update of any x; does not depend on the variables 2,...,2;-, with 
smaller indices. Then, by handling one variable after the other one can compute 
a closed form which corresponds to applying the loop’s update n times. Using 


Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 
- 235950644 (Project GI 274/6-2) and DFG Research Training Group 2236 UnRAVeL. 
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these closed forms, termination can be reduced to an existential formula over 
Z [15] (whose validity is decidable for linear arithmetic and where SMT solvers 
often also prove (in)validity in the non-linear case). In this way, one can show 
that non-termination of twn-loops over Z is semi-decidable (and it is decidable 
over the real numbers). 

While termination of twn-loops over Z is not decidable, by using the closed 
forms, [19] presented a “complete” complexity analysis technique. More precisely, 
for every twn-loop over Z, it infers a polynomial which is an upper bound on 
the runtime for all those inputs where the loop terminates. So for all (possibly 
non-linear) terminating twn-loops over Z, the technique of [19] always computes 
polynomial runtime bounds. In contrast, existing tools based on incomplete tech- 
niques for complexity analysis often fail for programs with non-linear arithmetic. 

In [6,18] we presented such an incomplete modular technique for complex- 
ity analysis which uses individual ranking functions for different subprograms. 
Based on this, we now introduce a novel approach to automatically infer runtime 
bounds for programs possibly consisting of multiple consecutive or nested loops 
by handling some subprograms as twn-loops and by using ranking functions for 
others. In order to compute runtime bounds, we analyze subprograms in topolog- 
ical order, i.e., in case of multiple consecutive loops, we start with the first loop 
and propagate knowledge about the resulting values of variables to subsequent 
loops. By inferring runtime bounds for one subprogram after the other, in the 
end we obtain a bound on the runtime complexity of the whole program. We first 
try to compute runtime bounds for subprograms by so-called multiphase linear 
ranking functions (M®RFs, see [3,4,18,20]). If M@RFs do not yield a finite run- 
time bound for the respective subprogram, then we use our novel twn-technique 
on the unsolved parts of the subprogram. So for the first time, “complete” com- 
plexity analysis techniques like [19] for subclasses of programs with non-linear 
arithmetic are combined with incomplete techniques based on (linear) ranking 
functions like [6,18]. Based on our approach, in future work one could integrate 
“complete” techniques for further subclasses (e.g., for solvable loops [24,25,30, 34] 
which can be transformed into twn-loops by suitable automorphisms [15]). 


Structure: After introducing preliminaries in Sect.2, in Sect.3 we show how 
to lift a (local) runtime bound which is only sound for a subprogram to an 
overall global runtime bound. In contrast to previous techniques [6,18], our lifting 
approach works for any method of bound computation (not only for ranking 
functions). In Sect.4, we improve the existing results on complexity analysis of 
twn- loops [14,15,19] such that they yield concrete polynomial bounds, we refine 
these bounds by considering invariants, and we show how to apply these results 
to full programs which contain twn-loops as subprograms. Section 5 extends 
this technique to larger subprograms which can be transformed into twn-loops. 
In Sect.6 we evaluate the implementation of our approach in the complexity 
analysis tool KoAT and show that one can now also successfully analyze the 
runtime of programs containing non-linear arithmetic. We refer to [26] for all 
proofs. 
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: p = 23 >0Aa4 > 0 : p = xf + x3 <a2Aa 40 
n(zı) = —2- x1 
n(z2) = 3- £2 — 2- r3 


Fig. 1. An Integer Program with a Nested Self-Loop 


2 Preliminaries 


This section recapitulates preliminaries for complexity analysis from [6,18]. 


Definition 1 (Atoms and Formulas). We fiz a set V of variables. The set 
of atoms A(V) consists of all inequations pı < p2 for polynomials pı, p2 € Z[V]. 
F(V) is the set of all propositional formulas built from atoms A(V), ^, and V. 


“I 
= 


In addition to “<”, we also use “>”, “£”, etc., and negations “=” which 
can be simulated by formulas (e.g., pı > p2 is equivalent to po < pı + 1 for 
integers). 

For integer programs, we use a formalism based on transitions, which also 
allows us to represent while-programs like (1) easily. Our programs may have 
non-deterministic branching, i.e., the guards of several applicable transitions 
can be satisfied. Moreover, non-deterministic sampling is modeled by temporary 
variables whose values are updated arbitrarily in each evaluation step. 


Definition 2 (Integer Program). (PV, L, lo, T) is an integer program where 


e PVCV is a finite set of program variables, V\PV are temporary variables 

e L is a finite set of locations with an initial location lọ € £ 

e T isa finite set of transitions. A transition is a 4-tuple (£, p,n, l) with a start 
location £ € £, target location ¢’ € £ \ {fo}, guard p € F(V), and update 
function 7: PV — Z[V] mapping program variables to update polynomials. 


Transitions (£o, _, -, -) are called initial. Note that €) has no incoming transitions. 


Example 3. Consider the program in Fig.1 with PV = {xz; | 1 <i <5}, L= 
{4 |0<i< 3}, and T = {ti | 0 <i < 5}, where ts has non-linear arithmetic 
in its guard and update. We omitted trivial guards, i.e., p = true, and identity 
updates of the form 7(v) = v. Thus, ts corresponds to the while-program (1). 


A state is a mapping o : V > Z, X denotes the set of all states, and £ x X 
is the set of configurations. We also apply states to arithmetic expressions p or 
formulas y, where the number o(p) resp. the Boolean value o(y) results from 
replacing each variable v by o(v). So for a state with o(a1) = —8, o(a2) = 55, 
and o(x3) = 1, the expression x? + x3 evaluates to o(a?7 + x3) = 65 and the 
formula y = (x? + 3 < x2) evaluates to o(p) = (65 < 55) = false. From now 
on, we fix a program (PY, L, lo, T). 
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Definition 4 (Evaluation of Programs). For configurations (¢,0), (€,0’) 
and t = (&:,9,7,&) E€ T, (60) + (l,o) is an evaluation step if l = h, 
f= &,, o(p) = true, and o(n(v)) = o'(v) for all v € PV. Let >r = Uir >t, 
where we also write — instead of >, or +7. Let (lo, co) —>" (lk, 0%) abbreviate 
(€9,00) 3... > (Lk, on) and let (0,0) 3* (€,0°) if (0,0) =" (l,o) for some 
k>0. 


So when denoting states o as tuples (o(x1),...,0(a5)) € Z° for the 
program in Fig.l, we have (€,(1,5,7,1,3)) >t (41:,(1,5,7,1,3)) >n 
(£3, (1,1,3,1,3)) 3 (é3,(1,—-8,55,1,3)) =! .... The runtime complexity 
rc(oo) of a program corresponds to the length of the longest evaluation starting 
in the initial state oo. 


Definition 5 (Runtime Complexity). The runtime complexity is re: X > N 
with N= NU {w} and re(o9) = sup{k E N | A(é’, 0’). (Lo, co) =>" (€’,0’)}. 


3 Computing Global Runtime Bounds 


We now introduce our general approach for computing (upper) runtime bounds. 
We use weakly monotonically increasing functions as bounds, since they can 
easily be “composed” (i.e., if f and g increase monotonically, then so does fog). 


Definition 6 (Bounds [6,18]). The set of bounds B is the smallest set with 
NCB, PY CB, and {b1 + bo, bı - b2, k*t} C B for allk € N and by, bz € B. 


A bound constructed from N, PV, +, and - is polynomial. So for PV = {x,y}, 
we have w, #7, x + y, 27+Y e B. Here, x? and x + y are polynomial bounds. 

We measure the size of variables by their absolute values. For any o € X, |o| 
is the state with |o|(v) = |o(v)| for all v € V. So if og denotes the initial state, 
then |oo] maps every variable to its initial “size”, i.e., its initial absolute value. 
RBegio : T — B is a global runtime bound if for each transition t and initial state 
oo E€ X, RBgio(t) evaluated in the state |ao| over-approximates the number of 
evaluations of t in any run starting in the configuration (lo, co). Let >} o —; 
denote the relation where arbitrary many evaluation steps are followed by a step 
with t. 


Definition 7 (Global Runtime Bound [6,18]). The function RBgio : T —> 
B is a global runtime bound if for allt € T and all states oo E€ X we have 
|oo|(RBogio(t)) = sup{k EN] 3(,0°). (£o, 00) (>F 0 =+)" (€,0')}. 


For the program in Fig.1, in Example 12 we will infer RBao(to) = 1, 
RBeio(ti) = z4 for 1 < i < 4, and RBglo(ts) = 8- £4- £5 + 13006 - x4. By 
adding the bounds for all transitions, a global runtime bound RBg1o yields an 
upper bound on the program’s runtime complexity. So for all ag € X we have 
lool(S yer RBeto(t)) > re(o0). 

For local runtime bounds, we consider the entry transitions of subsets T’ C T. 
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Definition 8 (Entry Transitions [6,18]). Let @ AT’ CT. Its entry transi- 
tions are Ep = {t | t=(€,y,, 2)ET\T' A there is a transition (£,-,-,-)€T"}. 


So in Fig. 1, we have E7\ {to} = {to} and Egr} = {t1, ta}. 

In contrast to global runtime bounds, a local runtime bound RBioc : Ex > B 
only takes a subset 7” into account. A local run is started by an entry transition 
r € Er followed by transitions from 7”. A local runtime bound considers a subset 
TZ C T' and over-approximates the number of evaluations of any transition from 
TZ in an arbitrary local run of the subprogram with the transitions 7’. More 
precisely, for every t € TZ, RBioc(r) over-approximates the number of applica- 
tions of t in any run of T’, if T’ is entered via r € Ez. However, local runtime 
bounds do not consider how often an entry transition from Ez’ is evaluated or 
how large a variable is when we evaluate an entry transition. To illustrate that 
RBioc(r) is a bound on the number of evaluations of transitions from TZ after 
evaluating r, we often write RB\..(—, TZ) instead of RBio<(r). 


Definition 9 (Local Runtime Bound). Let @ #4 TZ CT’ CT. The function 
RBioe : Ex — B is a local runtime bound for TZ w.r.t. T" if for allt € T, 
allr € Er: with r = (,_,-,-), and allo € X we have |o|(RBicc(—r TS)) = 
sup{k € N | Joo, (l,o). (€0,00) 2% 0 >r (2,0) (2%, 0 =)! (C,0°)}. 


Our approach is modular since it computes local bounds for program parts 
separately. To lift local to global runtime bounds, we use size bounds SB(t, v) to 
over-approximate the size (i.e., absolute value) of the variable v after evaluating t 
in any run of the program. See [6] for the automatic computation of size bounds. 


Definition 10 (Size Bound [6,18]). The function SB : (T x PV) > B 
is a size bound if for all (t,v) € T x PV and all states oo E€ X we have 
|oo|(SB(t, v)) > sup{o’(v)| | I, 0"). (£o, 70) (>* o >) (£, 0')}. 


To compute global from local runtime bounds RBjoc(—>r TX) and size bounds 
SB(r,v), Theorem 11 generalizes the approach of [6, 18]. Each local run is started 
by an entry transition r. Hence, we use an already computed global runtime 
bound RBg1o(r) to over-approximate the number of times that such a local run 
is started. To over-approximate the size of each variable v when entering the local 
run, we instantiate it by the size bound SB(r, v). So size bounds on previous tran- 
sitions are needed to compute runtime bounds, and similarly, runtime bounds are 
needed to compute size bounds in [6]. For any bound b, “b [v/SB(r,v) | v € PY)” 
results from b by replacing every program variable v by SB(r,v). Here, weak 
monotonic increase of b ensures that the over-approximation of the variables v 
in b by SB(r,v) indeed also leads to an over-approximation of b. The analysis 
starts with an initial runtime bound RBg1o and an initial size bound SB which 
map all transitions resp. all pairs from 7 x PY to w, except for the transitions t 
which do not occur in cycles of 7, where RBgio(t) = 1. Afterwards, RBg1o and 
SB are refined repeatedly, where we alternate between computing runtime and 
size bounds. 
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Theorem 11 (Computing Global Runtime Bounds). Let RB,i, be a 
global runtime bound, SB be a size bound, and Ø A TL C T' C T such that 
T’ contains no initial transitions. Moreover, let RBig¢. be a local runtime bound 
for TZ w.r.t. T'. Then RB is also a global runtime bound, where for allt € T 
we define: 


o 


RB’. (= RB golt), if teT\ TL 
gol) =) Zee, RBoio() * (RBioc(~>r T) [v/SB(r, v) | vePV)), if tE 


Example 12. For the example in Fig.1, we first use ZZ = {t2} and TJ’ = T \ 
{to}. With the ranking function x4 one obtains RBioc(—1, TZ) = z4, since te 
decreases the value of x4 and no transition increases it. Then we can infer the 
global runtime bound RBgio(t2) = RBeto(to)- (x4 [v/SB(to, v) | v € PV) = x4 as 
RBeio(to) = 1 (since to is evaluated at most once) and SB(tp, x4) = x4 (since to 
does not change any variables). Similarly, we can infer RB,io(t1) = RBgio(t3) = 
RBgio(t4) = Ba. 

For Tf = T’ = {ts}, our twn-approach in Sect. 4 will infer the local runtime 
bound RBioc : Etr} > B with RBioc(—2, {ts}) = 4: x2 +3 and RByoc(t, 
{ts}) = 4- £2 +4- £3 +4- x3 +3 in Example 30. By Theorem 11 we obtain the 
global bound 


RBgio(ts) = RBgio(ti) : (RBioc(>t, {ts })[v/SBlti, v) | v E€ PVJ) + 
RBgio(ta) : (RBioc(>t, {ts})[v/SB(ta, v) | v € PVJ) 
= z4: (4: £5 +3) + x4: (4: x5 +4-53+4:55 +3) 
(as SB(ti, £2) = SB(t4, £2) = T5 and SB(ta, x3) = 5) 
= 8-24-25 + 13006 - z4. 


Thus, re(oo) € O(n?) where n is the largest initial absolute value of all program 
variables. While the approach of [6,18] was limited to local bounds resulting from 
ranking functions, here we need our Theorem 11. It allows us to use both local 
bounds resulting from twn-loops (for the non-linear transition t; where tools 
based on ranking functions cannot infer a bound, see Sect.6) and local bounds 
resulting from ranking functions (for t,,...,¢4, since our twn-approach of Sect. 4 
and 5 is limited to so-called simple cycles and cannot handle the full program). 

In contrast to [6,18], we allow different local bounds for different entry tran- 
sitions in Definition 9 and Theorem 11. Our example demonstrates that this can 
indeed lead to a smaller asymptotic bound for the whole program: By distin- 
guishing the cases where ts is reached via tı or t4, we end up with a quadratic 
bound, because the local bound RBioc(—>+, {t5}) is linear and while x3 occurs 
with degrees 5 and 3 in RBjoc(—:x, {t5}), the size bound for x3 is constant after 
t3 and t4. 


To improve size and runtime bounds repeatedly, we treat the strongly con- 
nected components (SCCs)! of the program in topological order such that 


1 As usual, a graph is strongly connected if there is a path from every node to every 
other node. A strongly connected component is a maximal strongly connected sub- 
graph. 
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improved bounds for previous transitions are already available when handling 
the next SCC. We first try to infer local runtime bounds by multiphase-linear 
ranking functions (see [18] which also contains a heuristic for choosing TZ and 
T’ when using ranking functions). If ranking functions do not yield finite local 
bounds for all transitions of the SCC, then we apply the twn-technique from 
Sect.4 and 5 on the remaining unbounded transitions (see Sect.5 for choos- 
ing 7X and T’ in that case). Afterwards, the global runtime bound is updated 
according to Theorem 11. 


4 Local Runtime Bounds for Twn-Self-Loops 


In Sect. 4.1 we recapitulate twn-loops and their termination in our setting. Then 
in Sect.4.2 we present a (complete) algorithm to infer polynomial runtime 
bounds for all terminating twn-loops. Compared to [19], we increased its pre- 
cision considerably by computing bounds that take the different roles of the 
variables into account and by using over-approximations to remove monomials. 
Moreover, we show how our algorithm can be used to infer local runtime bounds 
for twn-loops occurring in integer programs. Section 5 will show that our algo- 
rithm can also be applied to infer runtime bounds for larger cycles in programs 
instead of just self-loops. 


4.1 Termination of Twn-Loops 


Definition 13 extends the definition of twn-loops in [15,19] by an initial transition 
and an update-invariant. Here, y is an update-invariant if H} p — n(w) where 
n is the update of the transition (i.e., invariance must hold independent of the 
guard). 


Definition 13. (Twn-Loop). An integer program (PY, L, lo, T) is a triangu- 
lar weakly non-linear loop (twn-loop) if PV = {z1,..., £a} for some d > 1, 
L = {b,£}, and T = {to,t} with to = (£0, v, id, £) and t = (£, p,n, £) for some 
p,p E€ F(PYV) with = y > n(w), where id(v) = v for all v € PY, and for all 
1 < i < d we have n(a;) = ci - £i + pi for some ci E€ Z and some polynomial 
pi € Zlxigi,..., La]. We often denote the loop by (Y, p,n) and refer to p, p, n 
as its (update-) invariant, guard, and update, respectively. If ci > 0 holds for all 
1<i<d, then the program is a non-negative triangular weakly non-linear loop 
(tnn-loop). 


Example 14. The program consisting of the initial transition (€o, true, id, 3) and 
the self-loop ts in Fig. 1 is a twn-loop (corresponding to the while-loop (1)). This 
loop terminates as every iteration increases x? by a factor of 4 whereas x2 is only 
tripled. Thus, x? + 23 eventually outgrows the value of x2. 


To transform programs into twn- or tnn-form, one can combine subsequent 
transitions by chaining. Here, similar to states g, we also apply the update 7 to 
polynomials and formulas by replacing each program variable v by 7(v). 
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Definition 15 (Chaining). Let tı,...,tn be a sequence of transitions without 
temporary variables where ti = (li, Pi, ti, li+1) for alll < i < n— 1, i.e., the 
target location of t; is the start location of ti41. We may have t; = t; fori £ j, 
i.e., a transition may occur several times in the sequence. Then the transition 
ty x... xtn = (l1, P, N, ln+1) results from chaining t1,...,tn where 


p= p1 A m(p2) A ne(m(ys)) A- A M-il- MmPn)--) 
n(v) = mh... (v). ..) for all v € PY, i.e., N= Mme... °M. 


Similar to [15,19], we can restrict ourselves to tnn-loops, since chaining trans- 
forms any twn-loop L into a tnn-loop L» L. Chaining preserves the termination 
behavior, and a bound on Lx L’s runtime can be transformed into a bound for L. 


Lemma 16 (Chaining Preserves Asymptotic Runtime, see [19, Lemma 
18]). For the twn-loop L = (4, p,n) with the transitions to = (€0,v,id, 2), t = 
(€,~,, £, and runtime complexity rcr, the program Lx L with the transitions to 
and txt = (v,pAn(y),7°7) is a tnn-loop. For its runtime complexity rcrst, 
we have 2- rcraL(o) < rez(o) < 2-repyr(o) +1 for alla € X. 


Example 17. The program of Example 14 is only a twn-loop and not a tnn- 
loop as x, occurs with a negative coefficient —2 in its own update. Hence, we 
chain the loop and consider ts x ts. The update of ts x ts is (yo 7)(a1) = 4- £1, 
(no n)(x2) = 9+ £2 — 8 - x3, and (n o n)(x3) = x3. To ease the presentation, in 
this example we will keep the guard y instead of using y A n(y) (ignoring 7(y) 
in the conjunction of the guard does not decrease the runtime complexity). 


Our algorithm starts with computing a closed form for the loop update, 
which describes the values of the program variables after n iterations of the 
loop. Formally, a tuple of arithmetic expressions cl} = (cl},,...,cl},) over 
the variables æ = (#1,...,2q) and the distinguished variable n is a (normalized) 
closed form for the update 7 with start value no > 0 if for all 1 <i<d 
and all ø : {%1,...,£4,n} > Z with a(n) > no, we have o(clZ.) = o(n"(a4)). 
As shown in [14,15,19], for tnn-loops such a normalized closed form and the 
start value no can be computed by handling one variable after the other, and 
these normalized closed forms can be represented as so-called normalized poly- 
exponential expressions. Here, N>m stands for {x € N | z > m}. 


Definition 18. (Normalized Poly-Exponential Expression [14,15,19]). 
Let PV = {x1,..., £a}. Then we define the set of all normalized poly-exponential 


expressions by NPE = {Dja -nĉi - b? | La; EN, pj € Q[PVY], bj € N>1}. 


Example 19. A normalized closed form (with start value no = 0) for the tnn-loop 
in Example 17 is c1}, = 21-4", cl, = (£2 — x3) -9" + 23, and c1}, = z3. 


Using the normalized closed form, similar to [15] one can represent non- 
termination of a tnn-loop (Y, 9, n) by the formula 


Jg eZ’, m EN. Vn € N>m. YA yļæ/c12]. (2) 
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Here, y[a/cl%] means that each variable x; in y is replaced by cl7,. Since w 
is an update-invariant, if ọ holds, then w[a/c1?] holds as well for all n > no. 
Hence, whenever Yn € Nom. Y A y[a@/cl®] holds, then cit") witnesses 
non-termination. Thus, invalidity of (2) is equivalent to termination of the loop. 

Normalized poly-exponential expressions have the advantage that it is always 
clear which addend determines their asymptotic growth when increasing n. So 
as in [15], (2) can be transformed into an existential formula and we use an SMT 
solver to prove its invalidity in order to prove termination of the loop. As shown 
in [15, Theorem 42], non-termination of twn-loops over Z is semi-decidable and 
deciding termination is Co-NP-complete if the loop is linear and the eigenvalues 
of the update matrix are rational. 


4.2 Runtime Bounds for Twn-Loops via Stabilization Thresholds 


As observed in [19], since the closed forms for tnn-loops are poly-exponential 
expressions that are weakly monotonic in n, every tnn-loop (Y, p,n) stabilizes 
for each input e € Z°. So there is a number of loop iterations (a stabilization 
threshold sth(y n) (e€)), such that the truth value of the loop guard y does not 
change anymore when performing further loop iterations. Hence, the runtime of 
every terminating tnn-loop is bounded by its stabilization threshold. 


Definition 20 (Stabilization Threshold). Let (~,y,7) be a tnn-loop with 
PV = {x1,..., va}. For each e = (e€1,...,ea) € Z4, let Ce € X with Celzi) = ei 
for alll <i<d. Let Ù C Z such thate € W iff cely) holds. Then sthiy yn) : 
Z? — N is the stabilization threshold of (Y, p,n) if for alle € Y, sth(y on) (€) 
is the smallest number such that oe (n"(y) = nP wem (©) (p) ) holds for all 
n= sth iy on) (€). 


For the tnn-loop from Example 17, it will turn out that 2- xo +2- £3 +2-x35+1 
is an upper bound on its stabilization threshold, see Example 28. 

To compute such upper bounds on a tnn-loop’s stabilization threshold (i.e., 
upper bounds on its runtime if the loop is terminating), we now present a con- 
struction based on monotonicity thresholds, which are computable [19, Lemma 
12]. 


Definition 21 (Monotonicity Threshold [19]). Let (bı, a1), (b2,a2) € N? 
such that (b1,a1) >iex (b2,a2) (i.e., bı > be or both bı = by and a, > ag). For 
any k € N>1, the k-monotonicity threshold of (61, a1) and (b2, a2) is the smallest 
no E N such that for all n > no we have n“! - bf > k- n°? - bg. 


For example, the 1-monotonicity threshold of (4,0) and (3,1) is 7 as the largest 
root of f(n) = 4" — n- 3” is approximately 6.5139. 

Our procedure again instantiates the variables of the loop guard y by the nor- 
malized closed form c1% of the loop’s update. However, in the poly-exponential 
expressions De pj: n“ - b? resulting from y[a/clz], the corresponding tech- 
nique of [19, Lemma 21] over-approximated the polynomials p; by a polynomial 
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that did not distinguish the effects of the different variables z1,..., £a. Such an 
over-approximation is only useful for a direct asymptotic bound on the runtime 
of the twn-loop, but it is too coarse for a useful local runtime bound within the 
complexity analysis of a larger program. For instance, in Example 12 it is crucial 
to obtain local bounds like 4 - £2 + 4- x3 + 4-23 +3 which indicate that only 
the variable x3 may influence the runtime with an exponent of 3 or 5. Thus, if 
the size of x3 is bound by a constant, then the resulting global bound becomes 
linear. 

So we now improve precision and over-approximate the polynomials p; by 
the polynomial L{p1,..., pe} which contains every monomial z{!-...+ x" of 
{pi,...,pe}, using the absolute value of the largest coefficient with which the 
monomial occurs in {p1,...,pe}. Thus, U{x3 — x3, £2 — 23} = v2 + £3 + 23. In 
the following let x = (x1,..., Zq), and for e = (e1,...,eg) E N, £e denotes 
ae era ie 


Definition 22 (Over-Approximation of Polynomials). Let p,,...,pe € 
Zia], and for all1 < j < £, let T; C (Z\ {0}) x N° be the index set of the polyno- 
mial pj where pj = Di (ceyer, C° and there are noc # c with (c,e),(c’,e) € Tj. 
For all e € N? we define ce € N with ce = max{|c| | (ce) € T1 U... U Te}, 
where max Ø = 0. Then the over-approximation of p1,...,pe is U{pi,...,pe} = 
Decena Ce ' xe. 


Clearly, Li{pi,...,pe} indeed over-approximates the absolute value of each pj. 


Corollary 23 (Soundness of Li{p,...,pe}). For allo : {11,...,ta} > Z 
and all 1 <j < £, we have |o|(U{pi,...,pe}) > |o(p;)]. 


A drawback is that U{pi,...,pe} considers all monomials and to obtain 
weakly monotonically increasing bounds from B, it uses the absolute values of 
their coefficients. This can lead to polynomials of unnecessarily high degree. To 
improve the precision of the resulting bounds, we now allow to over-approximate 
the poly-exponential expressions De p;:n% -b? which result from instantiating 
the variables of the loop guard by the closed form. For this over-approximation, 
we take the invariant w of the tnn-loop into account. So while (2) showed that 
update-invariants w can restrict the sets of possible witnesses for non-termination 
and thus simplify the termination proofs of twn-loops, we now show that pre- 
conditions % can also be useful to improve the bounds on twn-loops. 

More precisely, Definition 24 allows us to replace addends p-n®-b” by p-n*-j” 
where (j,i) >lex (b, a) if the monomial p is always positive (when the precondition 
w is fulfilled) and where (b,a) >iex (i,j) if p is always non-positive. 


Definition 24 (Over-Approximation of Poly-Exponential Expressions). 
Let p € F(PY) and let npe = X` ip apea P Nne: b” € NPE where A is a set of 
tuples (p,a,b) containing a monomial? p and two numbers a,b € N. Here, we 


? Here, we consider monomials of the form p = c- z{!-...- xt with coefficients c € Q. 
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may have (p,a, b), (p',a,b) € A for p 4 p'. Let A, T C A such that = Y > (p > 0) 
holds for all (p,a,b) € A and = 4% — (p < 0) holds for all (p,a,b) € T.” Then 


Y= . niab) . je n- b” 
[npe] A,r D ciehae p J(p,a,b) © panei es 


is an over-approximation of npe if 4(p,a,b);J(p,a,b) E N are numbers such that 


(ipat iloa) >Iex (b,a) holds if (p,a,b) € A and (b,a) >tex (mation) 
holds if (p,a,b) € T. Note that itp,a,b) OT J(p,a,b) can also be 0. 


Example 25. Let npe = q3-16" + q2-9" +q = q3:16" + 45-9" + qh - 9° +a, +d, 
where 93 = —z1, q2 = q3 +43, Q = %2, q3 = -73,n =H +q, % = T$, 
qi = —z3, and w = (x3 > 0). We can choose A = {(x3,0,1)} since = Y —> 
(x3 > 0) and I = {(—73,0,1)} since H Y — (—x3 < 0). Moreover, we choose 
J(e3,0,1) = 9, i(æ3,0,1) = 0, which is possible since (9,0) >1ex (1,0). Similarly, we 
choose j(_23,0,1) = 0, i(-23,0,1) = 0, since (1,0) >tex (0,0). Thus, we replace 
x3 and —2x3 by the larger addends x3 - 9” and 0. The motivation for the latter 
is that this removes all addends with exponent 5 from npe. The motivation 
for the former is that then, we have both the addends —x3- 9” and x3 -9” in 
the expression which cancel out, i.e., this removes all addends with exponent 3. 
Hence, we obtain [npe] h r = po : 16” + pı -9" with pọ = —x? and pı = ze. To 
find a suitable over-approximation which removes addends with high exponents, 
our implementation uses a heuristic for the choice of A, I, i(p,a,b); and J(p,a,b): 


The following lemma shows the soundness of the over-approximation 
[npe] 4, r 


Lemma 26 (Soundness of [npe] a as Let p, npe, A, I, ilp,a,b); J(p,a,b); and 
[npe] r be as in Definition 24, and let Dinpel = 
A E 


max( {1-monotonicity threshold of (jip a,b)» i(p,a,b)) and (b,a) | (p,a, b) € A} 
U{1-monotonicity threshold of (b,a) and (jip,a,b)» i(p,a,b)) | (p, a, 6) E TJ). 


Then for alle € © and alln > D » , we have ce( [npe] A > de(npe). 


[npe] a r 

For any terminating tnn-loop (Y, Y, n), Theorem 27 now uses the new con- 
cepts of Definition 22 and 24 to compute a polynomial sth” which is an upper 
bound on the loop’s stabilization threshold (and hence, on its runtime). For any 
atom a = (sı < s2) (resp. s2 — sı > 0) in the loop guard y, let npea E€ NPE be 
a poly-exponential expression which results from multiplying (s2 — sı)[æ/c17] 
with the least common multiple of all denominators occurring in (s2 —s1)[æ/c13]. 
Since the loop is terminating, for some of these atoms this expression will become 
non-positive for large enough n and our goal is to compute bounds on their 
corresponding stabilization thresholds. First, one can replace npea by an over- 


approximation [npea ] A pr where Yy’ = (Y A ẹ) considers both the invariant w 


3 A and I do not have to contain all such tuples, but can be (possibly empty) subsets. 
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and the guard y. Let W’ C Z? such that e € W’ iff o-(v’) holds. By Lemma 
26 (i.e., oe([npeal’s r) > o¢e(npeq) for all e € Y’), it suffices to compute a 


bound on the stabilization threshold of [npea |] A r if it is always non-positive for 
large enough n, because if [npea] A p is non-positive, then so is npea. We say 


that an over-approximation [npea] % r is eventually non-positive iff whenever 
[npea] 4 r # Npea, then one can show that for all e € W’, oe([npeal's r) is 
always non-positive for large enough n.t Using over-approximations [npea] 4. T 
can be advantageous because [npea] 4 r may contain less monomials than npea 
and thus, the construction U from Definition 22 can yield a polynomial of lower 
degree. So although npea’s stabilization threshold might be smaller than the one 
of [npea | A p, our technique might compute a smaller bound on the stabilization 


threshold when considering [npea] 4, r instead of npe. 


Theorem 27 (Bound on Stabilization Threshold). Let L= (Y, p,n) be a 
terminating tnn-loop, let Y’ = (Y A wy), and let c1} be a normalized closed form 
forn with start value no. For every atom a = (sı < s2) in gy, let [npea] r be an 


eventually non-positive over-approximation of npea and let Da = D inpe Kar 
ALAF 


If [npea] 4 r = DE pins -b? with pj #0 for alll < j < £ and (be, Ge) >lex 
... >lex (b1,a1), then let Ca = max{1, No, M2, . . . , No, Me}, where we have: 


0, if bj = bj-1 1, ifj =2 
M;=4 1-monotonicity threshold of Nj=4 mt’, if j =3 
(bj aj) and (bj-1,a;-1 + 1), if bj > bj-1 max{mt, mt'}, if j > 3 


Here, mt’ is the (j — 2)-monotonicity threshold of (bj-1,a;~1) and (bj—2,aj—2) 
and mt = max{1-monotonicity threshold of (bj—2,aj—2) and (bj,a;)|1<i< 
j—3}. Let Pola = {p1,.. -,pe-1}, Pal =") os a occurs in o Pola, C = max{Ca | 
atom a occurs in y}, D = max{ Da | atom a occurs in p}, and sth? € Zæ] with 
sth! = 2-UPol + max{no,C, D}. Then for all e € W', we have |ae|(sth’) > 
sthiy.y,n)(e)- If the tnn-loop has the initial transition to and looping transition 
t, then RBgio(to) = 1 and RBgio(t) = sth” is a global runtime bound for L. 


Example 28. The guard ọ of the tnn-loop in Example 17 has the atoms a = 
(£? + 23 < z2), a’ = (0 < z1), and a” = (0 < —21) (since zı 4 0 is transformed 
into a’ Va”). When instantiating the variables by the closed forms of Example 19 
with start value no = 0, Theorem 27 computes the bound 1 on the stabilization 
thresholds for a’ and a”. So the only interesting atom is a = (0 < s2 — sı) for 
sı = 27428 and s2 = £2. We get npea = (s2 — s1)[£/c13}] = q3-16"+q2-9" +H, 
with q; as in Example 25. 


t This can be shown similar to the proof of (2) for (non-)termination of the loop. Thus, 


we transform Ja € Z*, m € N. Yn € Nom. Y'A [npea] h r > 0 into an existential 
formula as in [15] and try to prove its invalidity by an SMT solver. 
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In the program of Fig. 1, the corresponding self-loop ts has two entry tran- 
sitions t4 and tı which result in two tnn-loops with the update-invariants 
pı = true resulting from transition t4 and w2 = (x3 > 0) from tı. So wz is 
an update-invariant of ts which always holds when reaching ts via transition t1. 

For w, = true, we choose A = I = Ø, i.e., [npea] 4i p = npeg. So we have 
bz = 16, bo = 9, bı = 1, and a; = 0 for all 1 < j < 3. We obtain 


Mə = 0, as 0 is the 1-monotonicity threshold of (9,0) and (1, 
M3 = 0, as 0 is the 1-monotonicity threshold of (16,0) and a 
N =1 and N; = 1, as 1 is the 1-monotonicity threshold of (9,0) 1,0 


Hence, we get C = Ca = max{1, N2, M2, N3, M3} = 1. So we obtain the runtime 
bound sth, = 2-Uf{q, q2} +max{no, Ca} = 2- £2 +2-234+2-23+1 for the loop 
t5xts w.r.t. Y1. By Lemma 16, this means that 2-sthy,, tl = 4-%2+4-03+4-2343 
is a runtime bound for the loop at transition ts. 

For the update-invariant %2 = (x3 > 0), we use the over-approximation 


[npea] % Ar = P2: 16" + pı: 9" with po = —x? and pı = z2 from Example 25, 
where Y} = (w2Ay) implies that it is always non-positive for large enough n. Now 
we obtain Mə = 0 (the 1-monotonicity threshold of (16,0) and (9, 1)) and M2 = 1, 
where C = Ca = max{1, No, Mo} = 1. Moreover, we have Da = max{1,0} = 1, 
since 

1 is the 1-monotonicity threshold of (9,0) and (1,0), and 

0 is the 1-monotonicity threshold of (1,0) and (0,0). 


We now get the tighter bound sthy,, = 2-U{pi} + max{no, Ca, Da} = 2- £2 +1 
for t5 xts. So t5’s runtime bound is 2- sth, +1 = 4- z2 +3 when using invariant 


Ya. 


Theorem 29 shows how the technique of Lemma 16 and Theorem 27 can 
be used to compute local runtime bounds for twn-loops whenever such loops 
occur within an integer program. To this end, one needs the new Theorem 11 
where in contrast to [6,18] these local bounds do not have to result from ranking 
functions. 

To turn a self-loop t and r € Ej; from a larger program P into a twn-loop 
(Y, p,n), we use t’s guard y and update 7. To obtain an update-invariant w, our 
implementation uses the Apron library [23] for computing invariants on a version 
of the full program where we remove all entry transitions E,,, except r.? From 
the invariants computed for t, we take those that are also update-invariants of t. 


Theorem 29 (Local Bounds for Twn-Loops). Let P = (PV,L,%,T) be 
an integer program with PV’ = {x1,...,ta} C PV. Lett = (€,¢,n, 2) € T with 
p E€ F(PV’), n(v) € Z[PV’'] for all v € PY’, and n(v) = v for all v € PV\PY’. 
For any entry transition r € Ey, let Y E€ F(PV’) such that = > n(Y) and 


5 Regarding invariants for the full program in the computation of local bounds for t 
is possible since in contrast to [6,18] our definition of local bounds from Definition 
9 is restricted to states that are reachable from an initial configuration (£0, co). 
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such that o(q) holds whenever there is a oo E€ X with (£o,00) ~*% 0 >r (4,0). 
If L = (b, p,n) is a terminating tnn-loop, then let RBioc(—+r {t}) = sth”, where 
sth" is defined as in Theorem 27. If L is a terminating twn-loop but no tnn- 
loop, let RBioc(—+ {t}) =2-sth’ +1, where sth" is the bound of Theorem 27 
computed for Lx L. Otherwise, let RBiocl>r {t}) = w. Then RBioc is a local 
runtime bound for {t} = TZ = T"' in the program P. 


Example 30. In Fig. 1, we consider the self-loop ts with Ett} = {ta, ti} and the 
update-invariants %ı = true resp. Y2 = (x3 > 0). For ts’s guard y and update 
n, both (vi, p,n) are terminating twn-loops (see Example 14), i.e., (2) is invalid. 

By Theorem 29 and Example 28, RBioc with RBioc( >t, {ts}) = 4-42 +4: 
x3 + 4- x3 +3 and RBioc(—+, {ts}) = 4- v2 + 3 is a local runtime bound for 
{ts} = TZ = T” in the program of Fig. 1. As shown in Example 12, Theorem 11 
then yields the global runtime bound RBgio(ts) = 8 - £4 - £5 + 13006 - x4. 


5 Local Runtime Bounds for Twn-Cycles 


Section 4 introduced a technique to determine local runtime bounds for twn-self- 
loops in a program. To increase its applicability, we now extend it to larger 
cycles. For every entry transition of the cycle, we chain the transitions of the 
cycle, starting with the transition which follows the entry transition. In this way, 
we obtain loops consisting of a single transition. If the chained loop is a twn-loop, 
we can apply Theorem 29 to compute a local runtime bound. Any local bound 
on the chained transition is also a bound on each of the original transitions.’ 

By Theorem 29, we obtain a bound on the number of evaluations of the 
complete cycle. However, we also have to consider a partial execution which 
stops before traversing the full cycle. Therefore, we increase every local runtime 
bound by 1. 

Note that this replacement of a cycle by a self-loop which results from chain- 
ing its transitions is only sound for simple cycles. A cycle is simple if each itera- 
tion through the cycle can only be done in a unique way. So the cycle must not 
have any subcycles and there also must not be any indeterminisms concerning 
the next transition to be taken. Formally, C = {t,,...,tn} C T is a simple cycle 
if C does not contain temporary variables and there are pairwise different loca- 
tions ¢,,...,, such that t; = (4i, -, -, li+1) for 1 < i < n—1 and tn = (€n,-,-, £1). 
This ensures that if there is an evaluation with —;, 0 > e\ 143 o —+,, then the 
steps with >, ,,,, have the form —4,,, 0...0 4, O St 0...0 Sti 

Algorithm 1 describes how to compute a local runtime bound for a simple 
cycle C = {t,,...,tn} as above. In the loop of Line 2, we iterate over all entry 
transitions r of C. If r reaches the transition t;, then in Line 3 and 4 we chain 
ty *...*ty xt) *...*t;-, which corresponds to one iteration of the cycle starting 


6 This is sufficient for our improved definition of local bounds in Definition 9 where in 
contrast to [6,18] we do not require a bound on the sum but only on each transition 
in the considered set T’. Moreover, here we again benefit from our extension to 
compute individual local bounds for different entry transitions. 
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Algorithm 1. Algorithm to Compute Local Runtime Bounds for Cycles 
input :A program (PY,L, 0,7) and a simple cycle C = {t1,... tn} CT 
output : A local runtime bound RBioc for C = TL = T’ 

Initialize RBioc: RBioc(>r C) = w for all r € Ec. 
forall r € Ec do 
Let i € {1,...,n} such that r’s target location is the start location £; of ti. 
Let t=t;*...%ty xti x... * tj 1. 
if there exists a renaming t of PV such that m(t) results in a twn-loop then 
| Set RBiocl—>r C) — m7! (1+ result of Theorem 29 on a(t) and 7(r)). 


au pr ON 


7 return local runtime bound RBjoc. 


:p = 23 > 0At > 0 
(x1) = z4 
n(x2) = £5 


: p = £? + r5 < r2 ^T #0 
n(v1) = —2 - xı 


t5a 
tsb : n(x2) = 3-42 — 2 - £3 


tS a > 0 
n(z1) = z4 
n(x2) = £5 


Fig. 2. An Integer Program with a Nested Non-Self-Loop 


in t;. If a suitable renaming (and thus also reordering) of the variables turns the 
chained transition into a twn-loop, then we use Theorem 29 to compute a local 
runtime bound RBioc(—>r C) in Lines 5 and 6. If the chained transition does not 
give rise to a twn-loop, then RBioc(—>r C) is w (Line 1). In practice, to use the 
twn-technique for a transition t in a program, our tool KoAT searches for those 
simple cycles that contain t and where the chained cycle is a twn-loop. Among 
those cycles it chooses the one with the smallest runtime bounds for its entry 
transitions. 


Theorem 31 (Correctness of Algorithm1). Let P = (PV, L,Llo,T) be an 
integer program and let C C T be a simple cycle in P. Then the result RBioc : 
Ec — B of Algorithm 1 is a local runtime bound for C = TZ =T”. 


Example 32. We apply Algorithm 1 on the cycle C = {t5a, tsb} of the program 
in Fig. 2. C’s entry transitions tı and t4 both end in 43. Chaining ts, and tse 
yields the transition ts of Fig. 1, i.e., t5 = ts, *tsy. Thus, Algorithm 1 essentially 
transforms the program of Fig. 2 into Fig. 1. As in Example 28 and 30, we obtain 
RBioc(— 4, C) = 1 + (2+ sth ue +1) = 4- £2 +4- £3 +4: z5 +4 and RBoc(>n 
C) = 1+ (2- sth,» + 1) = 4- £2 + 4, resulting in the global runtime bound 
RBeto(tsa) = RBgto(ts,) = 8 : £4 - £5 + 13008 - x4, which again yields re(øo) € 
O(n?). 


6 Conclusion and Evaluation 


We showed that results on subclasses of programs with computable complexity 
bounds like [19] are not only theoretically interesting, but they have an impor- 
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tant practical value. To our knowledge, our paper is the first to integrate such 
results into an incomplete approach for automated complexity analysis like [6,18]. 
For this integration, we developed several novel contributions which extend and 
improve the previous approaches in [6, 18,19] substantially: 


(a) We extended the concept of local runtime bounds such that they can now 
depend on entry transitions (Definition 9). 

(b) We generalized the computation of global runtime bounds such that one can 
now lift arbitrary local bounds to global bounds (Theorem 11). In particular, 
the local bounds might be due to either ranking functions or twn-loops. 

(c) We improved the technique for the computation of bounds on twn-loops such 
that these bounds now take the roles of the different variables into account 
(Definition 22, Corollary 23, and Theorem 27). 

(d) We extended the notion of twn-loops by update-invariants and developed 
a new over-approximation of their closed forms which takes invariants into 
account (Definition 13 and 24, Lemma 26, and Theorem 27). 

(e) We extended the handling of twn-loops to twn-cycles (Theorem 31). 


The need for these improvements is demonstrated by our leading example in 
Fig. 1 (where the contributions (a)—(d) are needed to infer quadratic runtime 
complexity) and by the example in Fig. 2 (which illustrates (e)). In this way, the 
power of automated complexity analysis is increased substantially, because now 
one can also infer runtime bounds for programs containing non-linear arithmetic. 
To demonstrate the power of our approach, we evaluated the integration 
of our new technique to infer local runtime bounds for twn-cycles in our re- 
implementation of the tool KoAT (written in OCaml) and compare the results to 
other state-of-the-art tools. To distinguish our re-implementation of KoAT from 
the original version of the tool from [6], let KoAT1 refer to the tool from [6] and 
let KoAT2 refer to our new re-implementation. KoAT2 applies a local control- 
flow refinement technique [18] (using the tool iRankFinder [8]) and preprocesses 
the program in the beginning, e.g., by extending the guards of transitions by 
invariants inferred using the Apron library [23]. For all occurring SMT problems, 
KoAT2 uses Z3 [28]. We tested the following configurations of KoAT2, which 
differ in the techniques used for the computation of local runtime bounds: 


e KoAT2+RF only uses linear ranking functions to compute local runtime 
bounds 

e KoAT2+M@RF5 uses multiphase-linear ranking functions of depth < 5 

e KoAT2+TWN only uses twn-cycles to compute local runtime bounds (Algo- 
rithm 1) 

e KoAT2+TWN-+RF uses Algorithm 1 for twn-cycles and linear ranking func- 
tions 

e KoAT2+TWN+M@RF5 uses Algorithm 1 for twn-cycles and M®RFs of depth 
<5 


Existing approaches for automated complexity analysis are already very pow- 
erful on programs that only use linear arithmetic in their guards and updates. 
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O(1) | O(n) |O(n?)|O(n7?)|O(exP)| <o ||AVGT(s)|AVG(s) 
KoAT2 + TWN + M@RF5(26 231 (5)|73 (5)/13 (4) |1 Ca) (3445) 8.72 | 23.93 
KoAT2 + TWN + RF|27 1227 (5)|73 (5)/13 (4) 1 (1) [341 (15) 811 | 19.77 
KoAT2 + M@RF5 24 226 (1)|68 10 0 328 (1) 8.23 21.63 
KoAT2 + RF 25 214 (1)|68 10 1 318 (1) 8.49 16.56 
MaxCore 23 216 (2)|66 7 0 312 (2) 2.02 5.31 
CoFloCo 22 196 (1)|66 5 0 289 (1) 0.62 2.66 
KoAT1 25 169 (1)|74 12 6 286 (1) LTT 2.77 
Loopus 17 170 (1)|49 5 (1) 0 241 (2) 0.42 0.43 
KoAT2 + TWN 20 111 (4) 3 (2)/2 (2) 0 136 (9) | 2.54 | 26.59 


Fig. 3. Evaluation on the Collection CINT* 


The corresponding benchmarks for Complexity of Integer Transitions Systems 
(CITS) and Complexity of C Integer Programs (CINT) from the Termination 
Problems Data Base [33] which is used in the annual Termination and Com- 
plexity Competition (TermComp) [17] contain almost only examples with linear 
arithmetic. Here, the existing tools already infer finite runtimes for more than 
89% of those examples in the collections CITS and CINT where this might” be 
possible. 

The main benefit of our new integration of the twn-technique is that in this 
way one can also infer finite runtime bounds for programs that contain non-linear 
guards or updates. To demonstrate this, we extended both collections CITS and 
CINT by 20 examples that represent typical such programs, including several 
benchmarks from the literature [3,14,15, 18,20,34], as well as our programs from 
Fig. 1 and 2. See [27] for a detailed list and description of these examples. 

Figure 3 presents our evaluation on the collection CINT*, consisting of the 484 
examples from CINT and our 20 additional examples for non-linear arithmetic. 
We refer to [27] for the (similar) results on the corresponding collection CITS*. 

In the C programs of CINT™, all variables are interpreted as integers over Z 
(i.e., without overflows). For KoAT2 and KoAT1, we used Clang [7] and Ilvm2kittel 
[10] to transform C programs into integer transitions systems as in Definition 2. 
We compare KoAT2 with KoAT1 [6] and the tools CoFloCo [11,12], MaxCore [2] 
with CoFloCo in the backend, and Loopus [31]. We do not compare with RaML 
[21], as it does not support programs whose complexity depends on (possibly 
negative) integers (see [29]). We also do not compare with PUBS [1], because as 
stated in [9] by one of its authors, CoFloCo is stronger than PUBS. For the same 
reason, we only consider MaxCore with the backend CoFloCo instead of PUBS. 

All tools were run inside an Ubuntu Docker container on a machine with an 
AMD Ryzen 7 3700X octa-core CPU and 48GB of RAM. As in TermComp, we 
applied a timeout of 5 min for every program. 

In Fig. 3, the first entry in every cell denotes the number of benchmarks from 
CINT* where the respective tool inferred the corresponding bound. The number 


T The tool LoAT [13,16] proves unbounded runtime for 217 of the 781 examples from 
CITS and iRankFinder [4,8] proves non-termination for 118 of 484 programs of CINT. 
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in brackets is the corresponding number of benchmarks when only regarding 
our 20 new examples for non-linear arithmetic. The runtime bounds computed 
by the tools are compared asymptotically as functions which depend on the 
largest initial absolute value n of all program variables. So for instance, there 
are 26 + 231 = 257 programs in CINT* (and 5 of them come from our new 
examples) where KoAT2+TWN+M@RF5 can show that rc(oo) € O(n) holds for 
all initial states op where |oo(v)| < n for all v € PV. For 26 of these programs, 
KoAT2+TWN+M@RF5 can even show that rc(a9) € O(1), i.e., their runtime 
complexity is constant. Overall, this configuration succeeds on 344 examples, i.e., 
“< oo” is the number of examples where a finite bound on the runtime complexity 
could be computed by the respective tool within the time limit. “AVGT(s)” is 
the average runtime of the tool on successful runs in seconds, i.e., where the tool 
inferred a finite time bound before reaching the timeout, whereas “AVG(s)” is 
the average runtime of the tool on all runs including timeouts. 

On the original benchmarks CINT where very few examples contain non-linear 
arithmetic, integrating TWN into a configuration that already uses multiphase- 
linear ranking functions does not increase power much: KoAT2+TWN+M@RF5 
succeeds on 344 — 15 = 329 such programs and KoAT2+M@RF5 solves 328 — 1 = 
327 examples. On the other hand, if one only has linear ranking functions, then an 
improvement via our twn-technique has similar effects as an improvement with 
multiphase-linear ranking functions (here, the success rate of KOAT2+M@RF5 is 
similar to KoAT2+TWN-+RF which solves 341 — 15 = 326 such programs). 

But the main benefit of our technique is that it also allows to successfully han- 
dle examples with non-linear arithmetic. Here, our new technique is significantly 
more powerful than previous ones. Other tools and configurations without TWN 
in Fig. 3 solve at most 2 of the 20 new examples. In contrast, KoAT2+T WN+RF 
and KoAT2ETWN+M@RF5 both succeed on 15 of them.® In particular, our run- 
ning examples from Fig. 1 and 2 and even isolated twn-loops like t5 or ts x ts 
from Example 14 and 17 can only be solved by KoAT2 with our twn-technique. 

To summarize, our evaluations show that KoAT2 with the added twn- 
technique outperforms all other configurations and tools for automated complex- 
ity analysis on all considered benchmark sets (i.e., CINT*, CINT, CITSt, and 
CITS) and it is the only tool which is also powerful on examples with non-linear 
arithmetic. 

KoAT’s source code, a binary, and a Docker image are available at https:// 
aprove-developers.github.io/KoAT_TWN/. The website also has details on our 
experiments and web interfaces to run KoAT’s configurations directly online. 


Acknowledgments. We are indebted to M. Hark for many fruitful discussions about 
complexity, twn-loops, and KoAT. We are grateful to S. Genaim and J. J. Doménech 
for a suitable version of iRankFinder which we could use for control-flow refinement 
in KoAT’s backend. Moreover, we thank A. Rubio and E. Martin-Martin for a static 
binary of MaxCore, A. Flores-Montoya and F. Zuleger for help in running CoFloCo and 
Loopus, F. Frohn for help and advice, and the reviewers for their feedback to improve 
the paper. 


8 One is the non-terminating leading example of [15], so at most 19 might terminate. 
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